P-Video-Animate
P-Video-Animate animates a single image using the motion, timing, and camera movement from a given video.
Given a reference input image and video, the model generates a new video using (1) the style of the reference image, and (2) preserving the original motion, acting, timing, camera movement, and scene structure of the reference video.
It is optimized for:
Top visual quality
Most efficient inference
Speed: 5.24 s generation time per 1 s of output video
Price: $0.03 and $0.06 per 1 s of 720p and 1080p video
Not sure how this differs from P-Video or P-Video-Avatar? See Choosing the right video model below.
Note
When using P-Video-Animate, respect the copyright of videos and images you use as input and of the video you generate.
Pricing:
Resolution |
Price |
|---|---|
720p |
$0.03 per second of output video |
1080p |
$0.06 per second of output video |
Tip
Test it in the P-Video Playground.
Prompt formula
Fast pass
Driver clip carries motion—good for first renders and timing checks.
Locked-in
Spell out identity, motion beats, and audio sync for repeatable runs.
Still / reference image — The API uses the first entry in
images.reference_image_promptis not a request field; generate the still with P-Image (see Domain Use Cases there), then upload the file asimage.Examples: “Photorealistic woman, 9:16 chest-up, soft window light, single subject”, “2D fitness mascot, cel-shaded, gym background, mouth open mid-hook”
video— Required driver clip. Motion, acting, timing, and camera movement come from this file.Examples: a winning UGC take, avatar output, or storyboard reference performance.
instruction_prompt— Optional text that names the subject and the motion beats to preserve from the driver. Usually empty on a fast pass; add detail when identity drifts or lip sync needs tightening.Fast pass: leave empty and let the driver carry performance.
Locked-in: “Animate the woman in the terracotta robe using the source video: she speaks to camera, lifts a serum bottle toward the lens, opens her palm, and nods. Match lip sync and audio from the source video.”
Slot |
Fast pass (enough to run) |
Locked-in (stronger control) |
|---|---|---|
Still prompt (P-Image → |
One line: who, outfit, basic light; single subject implied. |
Single subject explicit; pose, wardrobe, expression, light direction, set, lens + aspect—stable identity across reruns. |
|
Any clean driver with the motion you want to reuse. |
Same clip every run when comparing stills or prompt tweaks. |
|
Empty — driver carries motion and audio. |
Subject + motion beats + sync line; call out hands, props, and match lip sync and audio from the source video. |
Tip
For comprehensive video prompting (motion, framing, atmosphere), see the Video Generation guide.
How does it differ from other models?
There exist multiple models aiming at animating an image with the motion of a reference video. Among those models, P-Video-Animate is the fastest and most cost-efficient model without compromising on quality.
We provide benchmark numbers below to support it. Benchmark numbers below are directional and may vary depending on resolution, clip length, settings, provider, queue time, and test date.
Choosing the right video model
Pruna ships three performance video models. They share the same prediction API, but each solves a different production problem. P-Video-Animate requires an existing source video to drive your still.
Animate (this page) |
|||
|---|---|---|---|
One-line job |
Generate new footage from prompts |
Speak from one still (script or audio) |
Retarget one still with clip motion |
You start with |
Text prompt (+ optional image refs) |
Portrait still + |
Source video + one still |
You keep from the source |
N/A (new scene) |
Aspect ratio of the still |
Motion, timing, camera movement, and optionally audio |
Typical ask |
“Make a 10 s product ad in this style.” |
“This spokesperson says this line in French.” |
“Animate this catalog still using our winning ad take.” |
Quick decision guide
No source video yet → use P-Video to create the plate, or P-Video-Avatar if you only need a talking head from a still.
You already have footage and want one approved still to follow that clip’s motion → P-Video-Animate (only the first entry in
imagesis used).
Note
P-Video-Avatar vs. animate: Avatar creates speech from a still (TTS or uploaded audio, lip-sync-focused). Animate copies motion from a driver video onto your still—it does not write a new script. Use avatar for new lines; use animate when the timing and performance of an existing take should drive the still.
Tip
Common pipelines: Generate stills with P-Image → P-Video-Avatar for new spokesperson clips → P-Video-Animate to apply a hero still to motion from an avatar or ad clip.
Speed and throughput
Metric |
P-Video-Animate (720p benchmark) |
|---|---|
Generation time per 1 s of output |
5.24 s |
Wall-clock for 5 s output |
26.2 s |
Typical motion-transfer alternatives (1 s output) |
~36.0–43.0 s |
Typical motion-transfer alternatives (5 s output) |
~180.0–215.0 s |
Price per second (720p) |
$0.03 (vs. ~$0.07–$0.35 for comparable tools) |
Key features
P-Video-Animate fits the same Pruna API patterns as P-Video and P-Video-Avatar:
- Fastest cost-efficient motion transfer
Benchmarked at 5.24 s per 1 s of output—roughly 7× faster than typical motion-transfer alternatives at a fraction of the per-second cost.
- Single-image motion transfer
Upload a video and one image to retarget an approved still from a driver clip.
- Long-form friendly at 720p
Supports videos up to ~2 minutes at 720p (subject to platform limits; confirm in your account).
- Reliable everyday motion
Strong on normal movement and slow, controlled action—walking, talking heads, presenters, product demos.
- Motion and scene structure preservation
Follows the source clip’s acting, timing, camera movement, and layout while applying the reference image’s look.
- Audio-aware output
Control source audio with
save_audioandignore_audio(see Configuration).
Practical constraints
Output length follows the source video duration (within platform max length).
Output aspect ratio follows the source video.
Very fast action, heavy occlusion, or extreme camera motion may reduce consistency.
Use a clean, front-facing reference still when possible.
Only the first image in
imagesis used.
Examples
One tab per Hugging Face folder (ugc_ads, film_casting, gaming), with side-by-side cards showing the reference image (image), driver (video), a driver ↔ output compare clip, resolution, and copy-ready prompts. The still text is a P-Image prompt (see Domain Use Cases there for tone and structure). Use it in P-Image to create the frame, then upload that file as image in the API. The label reference_image_prompt in the copy blocks is documentation shorthand—it is not a request field.
Integration
P-Video-Animate uses the same Pruna prediction API as P-Video. Upload video and one image, then poll or use sync headers as with other video models.
Tip
For more information on how to use the API, see the API Reference.
- API Endpoint
Base URL:
https://api.pruna.ai/v1/predictions
Authentication
-H 'apikey: YOUR_API_KEY'
Step 1: Upload source video and reference image
curl -X POST "https://api.pruna.ai/v1/files" \
-H "apikey: YOUR_API_KEY" \
-F "content=@/path/to/source.mp4"
curl -X POST "https://api.pruna.ai/v1/files" \
-H "apikey: YOUR_API_KEY" \
-F "content=@/path/to/reference.jpg"
Use the returned file URLs as video and the first entry in images.
Step 2: Create generation request
Animate mode (synchronous)
curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-animate' \
-H 'Try-Sync: true' \
-d '{
"input": {
"video": "https://api.pruna.ai/v1/files/file-video123",
"images": [
"https://api.pruna.ai/v1/files/file-portrait"
],
"resolution": "720p",
"fps": 24,
"save_audio": true,
"seed": 42
}
}'
Configuration
Required parameters
Parameter |
Type |
Description |
|---|---|---|
video |
file/string |
Source RGB video ( |
images |
file[] / string[] |
Reference image(s). Animate uses the first image only. |
Optional parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
instruction_prompt |
string |
|
Usually left empty for animate. |
resolution |
string |
|
Target megapixel budget: |
fps |
integer |
|
Frames per second of the output video. |
save_audio |
boolean |
|
Save the video with audio. |
ignore_audio |
boolean |
|
Ignore source audio for prompt conditioning and return a silent output video. |
disable_safety_checker |
boolean |
|
Disable safety checker for generated videos (platform UI may still enforce checks). |
seed |
integer |
random |
Random seed. Leave blank for random. |
no_op |
boolean |
|
Health check mode — returns status without inference. |
Supported option values
resolution:720p,1080p.
Argument recommendations
Use these patterns for consistent quality:
video: prefer stable exposure, minimal motion blur, and clear visibility of the motion you want to transfer.images: high-resolution, well-lit, front-facing still matching the intended framing; only the first image is used.resolution: iterate in720p, then rerun finals in1080p.fps: match source footage when possible; default24is fine for most web delivery.save_audio/ignore_audio: keepsave_audio: truefor dialogue-driven clips; setignore_audio: truewhen you only need motion without sound.seed: set for reproducible A/B tests; change one variable at a time.disable_safety_checker: leave default unless your workflow includes explicit moderation.