P-Video-Animate animates a single image using the motion, timing, and camera movement from a given video.
Given a reference still and driver video, the model animates the image with the clip’s motion—output keeps the atmosphere of the still (look, lighting, wardrobe) while preserving the driver’s acting, timing, camera movement, and scene structure.
It is optimized for:
Top visual quality
Most efficient inference
Speed: 5.24 s generation time per 1 s of output video
Price: $0.03 and $0.06 per 1 s of 720p and 1080p video
single subject, pose, wardrobe, lens + aspect spelled out,
same driver clip every run,
name subject + motion beats + match lip sync and audio
Fast pass — still ↔ output compare; driver-led motion with a light still prompt and empty ``instruction_prompt``.
Locked-in — still ↔ output compare; detailed still + ``instruction_prompt`` with explicit motion and lip-sync rules.
Still / reference image — The API uses the first entry in images. reference_image_prompt is not a request field; generate the still with P-Image (see Domain Use Cases there), then upload the file as image.
Examples: “Photorealistic woman, 9:16 chest-up, soft window light, single subject”, “2D fitness mascot, cel-shaded, gym background, mouth open mid-hook”
video — Required driver clip. Motion, acting, timing, and camera movement come from this file.
Examples: a winning UGC take, avatar output, or storyboard reference performance.
instruction_prompt — Optional text that names the subject and the motion beats to preserve from the driver. Usually empty on a fast pass; add detail when identity drifts or lip sync needs tightening.
Fast pass: leave empty and let the driver carry performance.
Locked-in: “Animate the woman in the terracotta robe using the source video: she speaks to camera, lifts a serum bottle toward the lens, opens her palm, and nods. Match lip sync and audio from the source video.”
Slot
Fast pass (enough to run)
Locked-in (stronger control)
Still prompt (P-Image → images[0])
One line: who, outfit, basic light; single subject implied.
Single subject explicit; pose, wardrobe, expression, light direction, set, lens + aspect—stable identity across reruns.
video
Any clean driver with the motion you want to reuse.
Same clip every run when comparing stills or prompt tweaks.
instruction_prompt
Empty — driver carries motion and audio.
Subject + motion beats + sync line; call out hands, props, and match lip sync and audio from the source video.
Tip
For comprehensive video prompting (motion, framing, atmosphere), see the Video Generation guide.
How does it differ from other models?
There exist multiple models aiming at animating an image with the motion of a reference video. Among those models, P-Video-Animate is the fastest and most cost-efficient model without compromising on quality.
We provide benchmark numbers below to support it. Benchmark numbers below are directional and may vary depending on resolution, clip length, settings, provider, queue time, and test date.
P-Video-Animate vs. P-Video-Replace
Both models take a source video and reference image(s), but they are built for different workflows—not interchangeable substitutes.
Animates one image using motion, timing, and camera movement from a driver clip.
Replaces the character(s) in a video with the character(s) from reference stills.
What atmosphere you keep
The image’s atmosphere—look, lighting, wardrobe, and world of the still drive the output.
The video’s atmosphere—background, blocking, camera, and scene of the footage drive the output.
When to use it
You have an approved hero still and want it to perform like an existing take.
You have finished footage and want different people in the same shot.
Rule of thumb: Choose Animate when the still defines the world; choose Replace when the clip defines the world.
Choosing the right video model
Pruna ships four performance video models. They share the same prediction API, but each solves a different production problem. P-Video-Animaterequires an existing source video to drive your still.
Motion, timing, camera movement, and optionally audio
Camera, timing, blocking, background
Typical ask
“Make a 10 s product ad in this style.”
“This spokesperson says this line in French.”
“Animate this catalog still using our winning ad take.”
“Put our creator in this UGC b-roll.”
Quick decision guide
No source video yet → use P-Video to create the plate, or P-Video-Avatar if you only need a talking head from a still.
Footage exists and the hero still should move like the driver → P-Video-Animate (only the first entry in images is used). See P-Video-Animate vs. P-Video-Replace.
Footage exists and you need different people in the same shot → P-Video-Replace (Model:p-video-replace; use instruction_prompt when multiple people are on screen).
Note
P-Video-Avatar vs. animate: Avatar creates speech from a still (TTS or uploaded audio, lip-sync-focused). Animatecopies motion from a driver video onto your still—it does not write a new script. Use avatar for new lines; use animate when the timing and performance of an existing take should drive the still.
Tip
Common pipelines: Generate stills with P-Image → P-Video-Avatar for new spokesperson clips → P-Video-Replace to drop talent into b-roll → P-Video-Animate to apply a hero still to motion from an avatar or ad clip.
Benchmarked at 5.24 s per 1 s of output—roughly 7× faster than typical motion-transfer alternatives at a fraction of the per-second cost.
Single-image motion transfer
Upload a video and one image to retarget an approved still from a driver clip.
Long-form friendly at 720p
Supports videos up to ~2 minutes at 720p (subject to platform limits; confirm in your account).
Reliable everyday motion
Strong on normal movement and slow, controlled action—walking, talking heads, presenters, product demos.
Motion and scene structure preservation
Follows the source clip’s acting, timing, camera movement, and layout while applying the reference image’s look.
Audio-aware output
Control source audio with save_audio and ignore_audio (see Configuration).
Practical constraints
Output length follows the source video duration (within platform max length).
Output aspect ratio follows the source video.
Very fast action, heavy occlusion, or extreme camera motion may reduce consistency.
Use a clean, front-facing reference still when possible.
Only the first image in images is used.
Examples
One tab per Hugging Face folder (ugc_ads, film_casting, gaming, meme_remixes), with side-by-side cards showing the reference image (image), driver (video), a driver ↔ output compare clip, resolution, and copy-ready prompts. The still text is a P-Image prompt (see Domain Use Cases there for tone and structure). Use it in P-Image to create the frame, then upload that file as image in the API. The label reference_image_prompt in the copy blocks is documentation shorthand—it is not a request field.
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the Filipina woman in the terracotta robe using the source video: she speaks to camera, lifts a serum bottle toward the lens, opens her palm, raises her eyebrows, and nods. Match lip sync and audio from the sour…
reference_image_prompt: Vertical 9:16 three-quarter angle from slightly below, chest-up in a bathroom, Chest-up facing camera, both hands empty and relaxed at sides, mouth slightly open as if mid-sentence, shoulders squared. Photorealistic Fil…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the 2D illustrated fitness woman in the coral visor and black windbreaker using the source video: she leans toward camera, points forward, opens both hands in a listen-up gesture, then shakes her head with a hal…
reference_image_prompt: Vertical 9:16 three-quarter front chest-up in a gym, Three-quarter front toward camera, slight lean inward, both arms empty and relaxed at sides, mouth open mid-hook delivery. Clean 2D flat-vector animation portrait of…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the Black man in the grey tank top using the source video: he keeps both hands on the shaker bottle at his chest throughout, lifts the bottle toward the camera with both hands still gripping it without shaking,…
reference_image_prompt: Vertical 9:16 straight-on chest-up on an outdoor running track at midday under open sky, Chest-up facing camera, exactly one white shaker bottle centered at chest with both hands gripping the same single bottle, forearm…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the South Asian woman in the cream cardigan using the source video: she taps her phone screen, lifts the phone toward the camera, opens her left palm, and nods with a half-smile. Match lip sync and audio from th…
reference_image_prompt: Vertical 9:16 over-shoulder three-quarter chest-up in a sunlit apartment living room, Over-shoulder three-quarter toward camera, phone held at chest in right hand with screen facing lens, left hand empty at side, mouth…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the East Asian man in the navy apron using the source video: he lifts the meal box lid halfway, peels back the paper liner to reveal portions, raises his eyebrows with a grin, then shrugs. Match lip sync and aud…
reference_image_prompt: Vertical 9:16 high downshot chest-up over a kitchen counter, High downshot chest-up over a kitchen counter, both palms flat on a closed meal box lid at center frame, elbows on counter edge, mouth open mid-hook, shoulder…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the 3D cel-shaded wellness mascot in the sage wrap top using the source video: she speaks with lip sync in a tight face close-up, leans slightly toward camera, tilts her head with a smile, shrugs her shoulders w…
reference_image_prompt: 3D cel-shaded CGI mascot portrait only, Pixar-style mature adult woman age 25, not a photograph, not photorealistic, not a child not a toddler not a baby. Vertical 9:16 macro face-only close-up on a sunrise rooftop terr…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the woman in the sand hijab and camel coat using the source video: she gives a small fist pump, turns her head slightly toward camera, opens her palm outward, then nods with a smile. Match lip sync and audio fro…
reference_image_prompt: Vertical 9:16 profile chest-up on a commuter train window seat, Profile chest-up facing right, wireless earbuds visible, both hands empty at sides, mouth open mid-sentence, chin slightly lifted. Photorealistic Middle Ea…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the redhead woman in the sage robe using the source video: she speaks to camera with lip sync, runs her fingers through her hair with one hand, flips her hair toward the camera, briefly frames her face with both…
reference_image_prompt: Vertical 9:16 eye-level straight-on chest-up in a bathroom vanity nook, Eye-level straight-on chest-up facing camera, both arms hanging straight down at sides with hands relaxed beside hips, no hands raised, no hands to…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the anime-style streamer in the violet hoodie using the source video: he slaps the desk edge, points toward the monitor, leans back laughing with palms up, then gives a quick double fist pump with lip sync. Matc…
reference_image_prompt: Vertical 9:16 low-angle chest-up at a neon-lit gaming desk, Low-angle chest-up facing camera, both hands resting on desk edge below frame, mouth open mid-reaction, shoulders pulled back in disbelief. Original fictional…
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the animated storybook heroine in the teal ball gown using the source video: she speaks with natural lip sync, lifts her eyes upward, presses her hands closer to her chest, then settles into a gentle smile and n…
reference_image_prompt: Horizontal 16:9 chest-up three-quarter shot in a moonlit enchanted glade, Chest-up three-quarter toward camera, both hands clasped at center chest, shoulders soft, mouth slightly open in quiet wonder, eyes bright. Origi…
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the claymation sailor using the source video: he blinks and lifts his head, stretches both arms overhead, lowers them, looks left and back to camera, then exhales with a small smile. Match lip sync and audio fro…
reference_image_prompt: Horizontal 16:9 eye-level medium full-body shot on a handmade miniature dock set, Eye-level medium full-body front-facing, clay figure seated on miniature stool, both arms resting at sides with visible clay thumbprints,…
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the watercolor storybook knight in the emerald tabard using the source video: he speaks with natural lip sync, lifts his chin slightly, presses his hand firmer to his chest, then holds a small nod with steady ey…
reference_image_prompt: Horizontal 16:9 chest-up straight-on shot on a watercolor storybook castle rampart, Chest-up straight-on toward camera, right hand over heart, left arm at side, shoulders squared, mouth closed with a solemn half-smile,…
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the frost-mage commander in the cobalt battle coat using the source video: he keeps both hands on the map table, sweeps his right hand left across the projection, returns it flat to the table …
reference_image_prompt: Horizontal 16:9 chest-up three-quarter shot over a frost-lit ice citadel war-room holographic map table, Chest-up three-…
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the ice paladin using the source video: she stands in neutral idle with empty hands at her sides, shifts weight back, raises both hands into ready guard, and holds the pose. Match lip sync and…
reference_image_prompt: Horizontal 16:9 eye-level full-body front-facing hero key art, First-frame pose and camera locked — match the driver vid…
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the crystal dragon using the source video: it holds jaws closed and wings half-spread, stalks forward with heavy footfalls, spreads its wings wider, lifts its head slightly, and leans forward …
reference_image_prompt: Horizontal 16:9 low-angle boss reveal shot, First-frame pose and camera locked — match the driver video frame 0 exactly …
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the crystal knight in iridescent plate armor using the source video: she holds a crouch with her fist on the ground, pushes up to stand, snaps her right fist to her chest with her left arm bac…
reference_image_prompt: Horizontal 16:9 low-angle full-body shot on a stormy floating skybridge landing pad above cloud sea, Low-angle full-body…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the stylized gacha arcanist heroine in the indigo velvet robes using the source video: she speaks to camera with natural lip sync throughout, opens her right palm briefly, raises her right fis…
reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a violet crystal gacha summon hall with floating banner light orbs, Vertical ch…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the rogue in the teal jacket using the source video: she keeps both hands clasped at her chest, pulls them apart, forms a dazzled grin, leans forward slightly, and nods once. Match audio from …
reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a neon gacha summon chamber with holographic loot cards, Vertical chest-up fron…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the ranger in copper scale mail and antler pauldrons using the source video: she holds a fist at her chest, takes two steps forward, raises her fist higher, and holds with a subtle shoulder fl…
reference_image_prompt: Vertical 9:16 wide medium-long full-body shot on a castle rampart at golden hour, generous headroom and footroom with vi…
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the oil-painting noblewoman in the emerald velvet gown using the source video: she turns sharply to her left, brings both hands to her mouth in a gasp, inhales with raised shoulders, then hold…
reference_image_prompt: Horizontal 16:9 medium full-body three-quarter shot in a candlelit baroque hall, First-frame pose and camera locked — ma…
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the upright capybara in the olive cardigan using the source video: it keeps both paws on the mug, snaps its head back to face camera with raised eyebrows, leans back slightly with widened eyes…
reference_image_prompt: Horizontal 16:9 medium chest-up straight-on shot in a rain-streaked corner cafe booth, First-frame pose and camera locke…
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the upright bulldog in the charcoal three-piece suit using the source video: it keeps both arms open wide, takes two confident steps forward, raises one arm straight up with the other on its h…
reference_image_prompt: Horizontal 16:9 wide eye-level full-body shot on a foggy Thames riverside promenade at dawn, First-frame pose and camera…
Reference image image
Driver video
Driver ↔ output
1080p, 16:9
Example prompts
instruction_prompt: Animate the photorealistic golden retriever in the emerald blazer behind the desk using the source video: its mouth and jaw follow the driver's speech with accurate lip sync throughout, speaking the l…
reference_image_prompt: Horizontal 16:9 locked static eye-level chest-up webcam shot, photorealistic golden retriever seated behind an office de…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the freckled woman in the lavender cardigan using the source video: she keeps both hands beside her ears touching the oversized vintage dangle earrings only—never bananas, fruit, or peel props…
reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a vintage thrift-store fitting room with warm tungsten bulbs, Vertical chest-up…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the upright sloth in the lavender button-down using the source video: it rolls its eyes upward slowly, returns to a flat stare, lifts its shoulders into a shrug with palms up, then drops the s…
reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a fluorescent open-plan office with glass partitions, Vertical chest-up front-f…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the woman in the sage sweater and tortoiseshell glasses using the source video: she already holds the plate at chest height with both hands from the first frame, keeps both hands firmly on the…
reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a bright home living room with gray sofa and white wall paneling blurred behind…
Reference image image
Driver video
Driver ↔ output
1080p, 9:16
Example prompts
instruction_prompt: Animate the upright ferret in the yellow windbreaker and rain boots using the source video: it bounces in a criss-cross leg pattern with opposite arm swings, widens the arm swing slightly, then holds …
reference_image_prompt: Vertical 9:16 eye-level full-body shot on an empty subway platform with yellow safety line, Vertical full-body front-fac…
Integration
P-Video-Animate uses the same Pruna prediction API as P-Video. Upload video and one image, then poll or use sync headers as with other video models.
Tip
For more information on how to use the API, see the API Reference.