P-Video-Animate

P-Video-Animate animates a single image using the motion, timing, and camera movement from a given video.

Given a reference still and driver video, the model animates the image with the clip’s motion—output keeps the atmosphere of the still (look, lighting, wardrobe) while preserving the driver’s acting, timing, camera movement, and scene structure.

It is optimized for:

Top visual quality
Most efficient inference
- Speed: 5.24 s generation time per 1 s of output video
- Price: $0.03 and $0.06 per 1 s of 720p and 1080p video

Not sure how this differs from P-Video-Replace, P-Video, or P-Video-Avatar? See P-Video-Animate vs. P-Video-Replace and Choosing the right video model below.

Note

When using P-Video-Animate, respect the copyright of videos and images you use as input and of the video you generate.

Pricing:

Resolution	Price
720p	$0.03 per second of output video
1080p	$0.06 per second of output video

Tip

Test it in the P-Video Playground.

Prompt formula

Fast pass

Driver clip carries motion—good for first renders and timing checks.

[p-image still] [video] [instruction_prompt]

one line: who, outfit, aspect, soft light, upload winning ad / avatar take, usually empty

Locked-in

Spell out identity, motion beats, and audio sync for repeatable runs.

[p-image still] [video] [instruction_prompt]

single subject, pose, wardrobe, lens + aspect spelled out, same driver clip every run, name subject + motion beats + match lip sync and audio

Fast pass — still ↔ output compare; driver-led motion with a light still prompt and empty ``instruction_prompt``.

Locked-in — still ↔ output compare; detailed still + ``instruction_prompt`` with explicit motion and lip-sync rules.

Still / reference image — The API uses the first entry in images. reference_image_prompt is not a request field; generate the still with P-Image (see Domain Use Cases there), then upload the file as image.
- Examples: “Photorealistic woman, 9:16 chest-up, soft window light, single subject”, “2D fitness mascot, cel-shaded, gym background, mouth open mid-hook”
video — Required driver clip. Motion, acting, timing, and camera movement come from this file.
- Examples: a winning UGC take, avatar output, or storyboard reference performance.
instruction_prompt — Optional text that names the subject and the motion beats to preserve from the driver. Usually empty on a fast pass; add detail when identity drifts or lip sync needs tightening.
- Fast pass: leave empty and let the driver carry performance.
- Locked-in: “Animate the woman in the terracotta robe using the source video: she speaks to camera, lifts a serum bottle toward the lens, opens her palm, and nods. Match lip sync and audio from the source video.”

Slot	Fast pass (enough to run)	Locked-in (stronger control)
Still prompt (P-Image → `images[0]`)	One line: who, outfit, basic light; single subject implied.	Single subject explicit; pose, wardrobe, expression, light direction, set, lens + aspect—stable identity across reruns.
`video`	Any clean driver with the motion you want to reuse.	Same clip every run when comparing stills or prompt tweaks.
`instruction_prompt`	Empty — driver carries motion and audio.	Subject + motion beats + sync line; call out hands, props, and match lip sync and audio from the source video.

Tip

For comprehensive video prompting (motion, framing, atmosphere), see the Video Generation guide.

How does it differ from other models?

There exist multiple models aiming at animating an image with the motion of a reference video. Among those models, P-Video-Animate is the fastest and most cost-efficient model without compromising on quality.

We provide benchmark numbers below to support it. Benchmark numbers below are directional and may vary depending on resolution, clip length, settings, provider, queue time, and test date.

P-Video-Animate vs. P-Video-Replace

Both models take a source video and reference image(s), but they are built for different workflows—not interchangeable substitutes.

	P-Video-Animate	P-Video-Replace
What it does	Animates one image using motion, timing, and camera movement from a driver clip.	Replaces the character(s) in a video with the character(s) from reference stills.
What atmosphere you keep	The image’s atmosphere—look, lighting, wardrobe, and world of the still drive the output.	The video’s atmosphere—background, blocking, camera, and scene of the footage drive the output.
When to use it	You have an approved hero still and want it to perform like an existing take.	You have finished footage and want different people in the same shot.

Rule of thumb: Choose Animate when the still defines the world; choose Replace when the clip defines the world.

Choosing the right video model

Pruna ships four performance video models. They share the same prediction API, but each solves a different production problem. P-Video-Animate requires an existing source video to drive your still.

	P-Video	P-Video-Avatar	Animate (this page)	P-Video-Replace
One-line job	Generate new footage from prompts	Speak from one still (script or audio)	Retarget one still with clip motion	Swap characters in existing footage
You start with	Text prompt (+ optional image refs)	Portrait still + `voice_script` or `audio`	Source video + one still	Source video + 1–4 identity stills
You keep from the source	N/A (new scene)	Aspect ratio of the still	Motion, timing, camera movement, and optionally audio	Camera, timing, blocking, background
Typical ask	“Make a 10 s product ad in this style.”	“This spokesperson says this line in French.”	“Animate this catalog still using our winning ad take.”	“Put our creator in this UGC b-roll.”

Quick decision guide

No source video yet → use P-Video to create the plate, or P-Video-Avatar if you only need a talking head from a still.
Footage exists and the hero still should move like the driver → P-Video-Animate (only the first entry in images is used). See P-Video-Animate vs. P-Video-Replace.
Footage exists and you need different people in the same shot → P-Video-Replace (Model: p-video-replace; use instruction_prompt when multiple people are on screen).

Note

P-Video-Avatar vs. animate: Avatar creates speech from a still (TTS or uploaded audio, lip-sync-focused). Animate copies motion from a driver video onto your still—it does not write a new script. Use avatar for new lines; use animate when the timing and performance of an existing take should drive the still.

Tip

Common pipelines: Generate stills with P-Image → P-Video-Avatar for new spokesperson clips → P-Video-Replace to drop talent into b-roll → P-Video-Animate to apply a hero still to motion from an avatar or ad clip.

Speed and throughput

Metric	P-Video-Animate (720p benchmark)
Generation time per 1 s of output	5.24 s
Wall-clock for 5 s output	26.2 s
Typical motion-transfer alternatives (1 s output)	~36.0–43.0 s
Typical motion-transfer alternatives (5 s output)	~180.0–215.0 s
Price per second (720p)	$0.03 (vs. ~$0.07–$0.35 for comparable tools)

Key features

P-Video-Animate fits the same Pruna API patterns as P-Video and P-Video-Avatar:

Fastest cost-efficient motion transfer: Benchmarked at 5.24 s per 1 s of output—roughly 7× faster than typical motion-transfer alternatives at a fraction of the per-second cost.
Single-image motion transfer: Upload a video and one image to retarget an approved still from a driver clip.
Long-form friendly at 720p: Supports videos up to ~2 minutes at 720p (subject to platform limits; confirm in your account).
Reliable everyday motion: Strong on normal movement and slow, controlled action—walking, talking heads, presenters, product demos.
Motion and scene structure preservation: Follows the source clip’s acting, timing, camera movement, and layout while applying the reference image’s look.
Audio-aware output: Control source audio with save_audio and ignore_audio (see Configuration).

Practical constraints

Output length follows the source video duration (within platform max length).
Output aspect ratio follows the source video.
Very fast action, heavy occlusion, or extreme camera motion may reduce consistency.
Use a clean, front-facing reference still when possible.
Only the first image in images is used.

Examples

One tab per Hugging Face folder (ugc_ads, film_casting, gaming, meme_remixes), with side-by-side cards showing the reference image (image), driver (video), a driver ↔ output compare clip, resolution, and copy-ready prompts. The still text is a P-Image prompt (see Domain Use Cases there for tone and structure). Use it in P-Image to create the frame, then upload that file as image in the API. The label reference_image_prompt in the copy blocks is documentation shorthand—it is not a request field.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the Filipina woman in the terracotta robe using the source video: she speaks to camera, lifts a serum bottle toward the lens, opens her palm, raises her eyebrows, and nods. Match lip sync and audio from the sour… reference_image_prompt: Vertical 9:16 three-quarter angle from slightly below, chest-up in a bathroom, Chest-up facing camera, both hands empty and relaxed at sides, mouth slightly open as if mid-sentence, shoulders squared. Photorealistic Fil…

instruction_prompt: Animate the Filipina woman in the terracotta robe using the source video: she speaks to camera, lifts a serum bottle toward the lens, opens her palm, raises her eyebrows, and nods. Match lip sync and audio from the source video.
            reference_image_prompt: Vertical 9:16 three-quarter angle from slightly below, chest-up in a bathroom, Chest-up facing camera, both hands empty and relaxed at sides, mouth slightly open as if mid-sentence, shoulders squared. Photorealistic Filipina woman mid-twenties, warm golden-brown skin, faint beauty mark above left lip, wavy black hair in tortoiseshell claw clip, terracotta linen robe with rolled cuffs, thin gold hoop earrings and delicate chain necklace, sage-green subway tile bathroom with brass faucet, fogged mirror edge, and folded white towels on a bamboo shelf, amber ring-light catchlight in eyes, both hands empty and relaxed at sides, no product visible, soft steam haze near mirror edge, mouth slightly open as if speaking to camera, one person only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the 2D illustrated fitness woman in the coral visor and black windbreaker using the source video: she leans toward camera, points forward, opens both hands in a listen-up gesture, then shakes her head with a hal… reference_image_prompt: Vertical 9:16 three-quarter front chest-up in a gym, Three-quarter front toward camera, slight lean inward, both arms empty and relaxed at sides, mouth open mid-hook delivery. Clean 2D flat-vector animation portrait of…

instruction_prompt: Animate the 2D illustrated fitness woman in the coral visor and black windbreaker using the source video: she leans toward camera, points forward, opens both hands in a listen-up gesture, then shakes her head with a half-smirk. Match lip sync and audio from the source video.
            reference_image_prompt: Vertical 9:16 three-quarter front chest-up in a gym, Three-quarter front toward camera, slight lean inward, both arms empty and relaxed at sides, mouth open mid-hook delivery. Clean 2D flat-vector animation portrait of an original fictional Latina fitness woman early thirties, warm tan skin, dark ponytail through coral visor, open black windbreaker with thick white sleeve stripes over grey crop tank, thin gold chain necklace, cel-shaded flat colors with crisp bold outlines, modern fitness-app mascot illustration style, not photorealistic, tight chest-up crop from mid-torso up only, shoulders and head fill the frame, legs and feet not visible, no full-body shot, three-quarter front toward camera with slight lean inward, orange resistance tubes and kettlebells blurred behind on rubber mat, charcoal gym walls with faded motivational stencil text, harsh overhead fluorescents with green rim light from left, both hands empty at sides, mouth open mid-sentence, one illustrated character only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the Black man in the grey tank top using the source video: he keeps both hands on the shaker bottle at his chest throughout, lifts the bottle toward the camera with both hands still gripping it without shaking,… reference_image_prompt: Vertical 9:16 straight-on chest-up on an outdoor running track at midday under open sky, Chest-up facing camera, exactly one white shaker bottle centered at chest with both hands gripping the same single bottle, forearm…

instruction_prompt: Animate the Black man in the grey tank top using the source video: he keeps both hands on the shaker bottle at his chest throughout, lifts the bottle toward the camera with both hands still gripping it without shaking, raises his eyebrows with a half-smirk, then nods. Match lip sync and audio from the source video.
            reference_image_prompt: Vertical 9:16 straight-on chest-up on an outdoor running track at midday under open sky, Chest-up facing camera, exactly one white shaker bottle centered at chest with both hands gripping the same single bottle, forearms bent inward, bottle vertical and still, mouth open mid-hook, shoulders relaxed. Photorealistic Black man late twenties, deep brown skin, close-cropped fade haircut, grey performance tank top with sweat sheen, silver chain necklace, one white shaker bottle with black lid only, both forearms bent gripping one bottle at center chest, not one bottle per hand, no second bottle, no bottle in left hand alone, no bottle in right hand alone, no floating hands, no arm at side, outdoor running track with red lane lines on dark rubber surface, chain-link fence and empty metal bleachers behind, bright natural sunlight and blue sky, open-air sports field environment, not inside any building, no kitchen no cabinets no countertop no oven no indoor room, straight-on chest-up, mouth open mid-sentence, one person only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the South Asian woman in the cream cardigan using the source video: she taps her phone screen, lifts the phone toward the camera, opens her left palm, and nods with a half-smile. Match lip sync and audio from th… reference_image_prompt: Vertical 9:16 over-shoulder three-quarter chest-up in a sunlit apartment living room, Over-shoulder three-quarter toward camera, phone held at chest in right hand with screen facing lens, left hand empty at side, mouth…

instruction_prompt: Animate the South Asian woman in the cream cardigan using the source video: she taps her phone screen, lifts the phone toward the camera, opens her left palm, and nods with a half-smile. Match lip sync and audio from the source video.
            reference_image_prompt: Vertical 9:16 over-shoulder three-quarter chest-up in a sunlit apartment living room, Over-shoulder three-quarter toward camera, phone held at chest in right hand with screen facing lens, left hand empty at side, mouth open mid-demo line, shoulders relaxed. Standing upright chest-up in front of linen sofa not sitting, torso and face three-quarter toward camera, eyes toward lens, right hand holds phone vertically at chest with budget app screen facing camera, left arm hanging relaxed empty at side, no pointing, no cross-legged pose, no profile looking down at phone. Photorealistic South Asian woman late twenties, warm brown skin, black hair in low sleek ponytail, cream ribbed cardigan over white tee, thin gold huggie earrings, linen sofa with sage throw pillow and fiddle-leaf fig blurred behind, warm afternoon window light from left, phone screen shows blurred colorful budget chart UI without readable text, standing app-demo pose matching driver opening frame, mouth open mid-sentence, one person only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the East Asian man in the navy apron using the source video: he lifts the meal box lid halfway, peels back the paper liner to reveal portions, raises his eyebrows with a grin, then shrugs. Match lip sync and aud… reference_image_prompt: Vertical 9:16 high downshot chest-up over a kitchen counter, High downshot chest-up over a kitchen counter, both palms flat on a closed meal box lid at center frame, elbows on counter edge, mouth open mid-hook, shoulder…

instruction_prompt: Animate the East Asian man in the navy apron using the source video: he lifts the meal box lid halfway, peels back the paper liner to reveal portions, raises his eyebrows with a grin, then shrugs. Match lip sync and audio from the source video.
            reference_image_prompt: Vertical 9:16 high downshot chest-up over a kitchen counter, High downshot chest-up over a kitchen counter, both palms flat on a closed meal box lid at center frame, elbows on counter edge, mouth open mid-hook, shoulders slightly hunched toward camera. Clean 2D flat-vector animation portrait of an original fictional Black woman home chef early thirties, deep brown skin, natural hair in a high puff, sunflower-yellow apron over teal long-sleeve tee, cel-shaded flat colors with crisp bold outlines, modern meal-kit app mascot illustration style, not photorealistic, tight chest-up crop from mid-torso up only, shoulders and head fill the frame, closed coral meal-kit box flat on counter with illustrated food icons and no readable brand name, box never lifted off counter, pastel kitchen counter with copper kettle and herb jar blurred behind, warm under-cabinet light, high downshot chest-up, mouth open mid-sentence, one illustrated character only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the 3D cel-shaded wellness mascot in the sage wrap top using the source video: she speaks with lip sync in a tight face close-up, leans slightly toward camera, tilts her head with a smile, shrugs her shoulders w… reference_image_prompt: 3D cel-shaded CGI mascot portrait only, Pixar-style mature adult woman age 25, not a photograph, not photorealistic, not a child not a toddler not a baby. Vertical 9:16 macro face-only close-up on a sunrise rooftop terr…

instruction_prompt: Animate the 3D cel-shaded wellness mascot in the sage wrap top using the source video: she speaks with lip sync in a tight face close-up, leans slightly toward camera, tilts her head with a smile, shrugs her shoulders with an eyebrow pop, then nods. No hands or arms visible at any point—face and shoulders only. Never show lotion, cream, or white product on her face. Match lip sync and audio from the source video.
            reference_image_prompt: 3D cel-shaded CGI mascot portrait only, Pixar-style mature adult woman age 25, not a photograph, not photorealistic, not a child not a toddler not a baby. Vertical 9:16 macro face-only close-up on a sunrise rooftop terrace, Macro face-only close-up facing camera, forehead to chin fills entire frame, no shoulders visible, no neck below jawline, no hands, no arms, no torso, mouth slightly open mid-sentence, face completely clean with uniform peachy skin and no white cream or cosmetic marks. Original fictional wellness app mascot, not based on any existing franchise. Stylized CGI with soft subsurface skin, large expressive eyes, lavender hair in a loose side braid with small jade hair clip, sage-green wrap top collar barely visible at bottom edge, soft peach sunrise sky and blurred bamboo bokeh behind, premium mobile game cel-shaded render, one 3D character only, single subject, no group.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the woman in the sand hijab and camel coat using the source video: she gives a small fist pump, turns her head slightly toward camera, opens her palm outward, then nods with a smile. Match lip sync and audio fro… reference_image_prompt: Vertical 9:16 profile chest-up on a commuter train window seat, Profile chest-up facing right, wireless earbuds visible, both hands empty at sides, mouth open mid-sentence, chin slightly lifted. Photorealistic Middle Ea…

instruction_prompt: Animate the woman in the sand hijab and camel coat using the source video: she gives a small fist pump, turns her head slightly toward camera, opens her palm outward, then nods with a smile. Match lip sync and audio from the source video.
            reference_image_prompt: Vertical 9:16 profile chest-up on a commuter train window seat, Profile chest-up facing right, wireless earbuds visible, both hands empty at sides, mouth open mid-sentence, chin slightly lifted. Photorealistic Middle Eastern woman early twenties, olive skin, hijab in soft sand linen, white wireless earbuds, camel wool coat over cream turtleneck, blurred city lights through rain-streaked train window, cool blue ambient light, profile chest-up facing right, mouth open mid-sentence, one person only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the redhead woman in the sage robe using the source video: she speaks to camera with lip sync, runs her fingers through her hair with one hand, flips her hair toward the camera, briefly frames her face with both… reference_image_prompt: Vertical 9:16 eye-level straight-on chest-up in a bathroom vanity nook, Eye-level straight-on chest-up facing camera, both arms hanging straight down at sides with hands relaxed beside hips, no hands raised, no hands to…

instruction_prompt: Animate the redhead woman in the sage robe using the source video: she speaks to camera with lip sync, runs her fingers through her hair with one hand, flips her hair toward the camera, briefly frames her face with both hands then drops them, and raises her eyebrows with a nod. Match lip sync and audio from the source video.
            reference_image_prompt: Vertical 9:16 eye-level straight-on chest-up in a bathroom vanity nook, Eye-level straight-on chest-up facing camera, both arms hanging straight down at sides with hands relaxed beside hips, no hands raised, no hands touching hair or face, mouth slightly open mid-sentence, shoulders squared. Photorealistic white woman mid-twenties, fair skin with light freckles, copper-red wavy hair past shoulders with visible volume and shine, sage-green satin robe, single thin gold chain necklace, pink vanity with round Hollywood mirror and skincare bottles blurred behind her, warm soft vanity light catchlights, anatomically correct hands with five fingers each resting beside hips below chest line, no product visible, one person only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the anime-style streamer in the violet hoodie using the source video: he slaps the desk edge, points toward the monitor, leans back laughing with palms up, then gives a quick double fist pump with lip sync. Matc… reference_image_prompt: Vertical 9:16 low-angle chest-up at a neon-lit gaming desk, Low-angle chest-up facing camera, both hands resting on desk edge below frame, mouth open mid-reaction, shoulders pulled back in disbelief. Original fictional…

instruction_prompt: Animate the anime-style streamer in the violet hoodie using the source video: he slaps the desk edge, points toward the monitor, leans back laughing with palms up, then gives a quick double fist pump with lip sync. Match lip sync and audio from the source video.
            reference_image_prompt: Vertical 9:16 low-angle chest-up at a neon-lit gaming desk, Low-angle chest-up facing camera, both hands resting on desk edge below frame, mouth open mid-reaction, shoulders pulled back in disbelief. Original fictional anime-style streamer portrait, not based on any existing franchise. Clean cel-anime illustration of a young man with teal undercut hair, violet gaming hoodie with white drawstrings, black over-ear headset with magenta LED ring, crisp ink outlines and flat cel shading, not photorealistic, dual monitors with blurred victory overlay behind, low-angle chest-up toward camera, mouth open mid-sentence, one illustrated character only.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the animated storybook heroine in the teal ball gown using the source video: she speaks with natural lip sync, lifts her eyes upward, presses her hands closer to her chest, then settles into a gentle smile and n… reference_image_prompt: Horizontal 16:9 chest-up three-quarter shot in a moonlit enchanted glade, Chest-up three-quarter toward camera, both hands clasped at center chest, shoulders soft, mouth slightly open in quiet wonder, eyes bright. Origi…

instruction_prompt: Animate the animated storybook heroine in the teal ball gown using the source video: she speaks with natural lip sync, lifts her eyes upward, presses her hands closer to her chest, then settles into a gentle smile and nod. Match lip sync and audio from the source video.
            reference_image_prompt: Horizontal 16:9 chest-up three-quarter shot in a moonlit enchanted glade, Chest-up three-quarter toward camera, both hands clasped at center chest, shoulders soft, mouth slightly open in quiet wonder, eyes bright. Original fictional animated storybook heroine, not based on any existing franchise or copyrighted character. Stylized 3D cel-shaded CGI render with soft subsurface skin, large expressive eyes, rose-gold tiara over auburn corkscrew curls, teal ball gown with gold embroidery and puffed sleeves, fireflies and bioluminescent mushrooms in a moonlit enchanted glade with soft blue rim light, chest-up three-quarter toward camera, both hands clasped at chest, mouth slightly open in quiet wonder, premium family-animation feature-film look, one character only.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the claymation sailor using the source video: he blinks and lifts his head, stretches both arms overhead, lowers them, looks left and back to camera, then exhales with a small smile. Match lip sync and audio fro… reference_image_prompt: Horizontal 16:9 eye-level medium full-body shot on a handmade miniature dock set, Eye-level medium full-body front-facing, clay figure seated on miniature stool, both arms resting at sides with visible clay thumbprints,…

instruction_prompt: Animate the claymation sailor using the source video: he blinks and lifts his head, stretches both arms overhead, lowers them, looks left and back to camera, then exhales with a small smile. Match lip sync and audio from the source video.
            reference_image_prompt: Horizontal 16:9 eye-level medium full-body shot on a handmade miniature dock set, Eye-level medium full-body front-facing, clay figure seated on miniature stool, both arms resting at sides with visible clay thumbprints, head level, mouth closed, eyes half-lidded as if just waking. Original fictional stop-motion claymation sailor character, not based on any existing franchise. Hand-sculpted clay figure with visible fingerprint textures, navy knit cap and striped shirt, rosy cheek blobs, seated on a miniature wooden dock with painted blue water backdrop and coiled rope prop, handmade stop-motion puppet aesthetic with warm tungsten desk-lamp light, eye-level medium full-body front-facing on stool, arms at sides, eyes half-lidded, one clay character only.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the watercolor storybook knight in the emerald tabard using the source video: he speaks with natural lip sync, lifts his chin slightly, presses his hand firmer to his chest, then holds a small nod with steady ey… reference_image_prompt: Horizontal 16:9 chest-up straight-on shot on a watercolor storybook castle rampart, Chest-up straight-on toward camera, right hand over heart, left arm at side, shoulders squared, mouth closed with a solemn half-smile,…

instruction_prompt: Animate the watercolor storybook knight in the emerald tabard using the source video: he speaks with natural lip sync, lifts his chin slightly, presses his hand firmer to his chest, then holds a small nod with steady eye contact. Match lip sync and audio from the source video.
            reference_image_prompt: Horizontal 16:9 chest-up straight-on shot on a watercolor storybook castle rampart, Chest-up straight-on toward camera, right hand over heart, left arm at side, shoulders squared, mouth closed with a solemn half-smile, eyes steady on camera. Original fictional watercolor storybook knight illustration, not based on any existing franchise. Hand-painted gouache style with visible paper grain and soft pigment blooms, emerald tabard with gold trim over chainmail, russet hair under a simple steel circlet, misty castle rampart with pennant and distant green hills, chest-up straight-on toward camera, right hand over heart, left arm at side, solemn half-smile, children's picture-book illustration aesthetic, one character only.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the frost-mage commander in the cobalt battle coat using the source video: he keeps both hands on the map table, sweeps his right hand left across the projection, returns it flat to the table … reference_image_prompt: Horizontal 16:9 chest-up three-quarter shot over a frost-lit ice citadel war-room holographic map table, Chest-up three-…

instruction_prompt: Animate the frost-mage commander in the cobalt battle coat using the source video: he keeps both hands on the map table, sweeps his right hand left across the projection, returns it flat to the table edge, and holds with a small nod. Match audio from the source video.
            reference_image_prompt: Horizontal 16:9 chest-up three-quarter shot over a frost-lit ice citadel war-room holographic map table, Chest-up three-quarter toward holographic map table, both hands flat on table edge, jaw set, eyes on glowing projection, shoulders squared. Original fictional RTS frost-mage commander skin, not based on any existing franchise. Photorealistic East Asian man forties, silver undercut, trimmed black beard, white fur-lined cobalt battle coat with crystal rank pins, frost-blue ear-com mic, glowing cyan terrain projection on dark glass table with ice hex grid and amber siege markers, carved ice walls with hanging lantern chains and blurred frost golem silhouettes, chest-up three-quarter over map with both hands on table edge, premium UE5 cinematic render, no HUD, one character only.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the ice paladin using the source video: she stands in neutral idle with empty hands at her sides, shifts weight back, raises both hands into ready guard, and holds the pose. Match lip sync and… reference_image_prompt: Horizontal 16:9 eye-level full-body front-facing hero key art, First-frame pose and camera locked — match the driver vid…

instruction_prompt: Animate the ice paladin using the source video: she stands in neutral idle with empty hands at her sides, shifts weight back, raises both hands into ready guard, and holds the pose. Match lip sync and audio from the source video.
            reference_image_prompt: Horizontal 16:9 eye-level full-body front-facing hero key art, First-frame pose and camera locked — match the driver video frame 0 exactly (same subject facing, limb positions, joint angles, and shot scale): Eye-level full-body front-facing neutral idle, both arms relaxed at sides with empty hands visible, feet planted, weight even, no weapon. Original fictional game character, not based on any existing franchise. Playable ice paladin heroine skin, silver frost plate armor with icicle pauldrons and fur-lined collar, white braid with silver circlet and blue gem, cracked cobblestone frost arena with frozen puddles, blue torch braziers, hanging icicle banners, eye-level full-body front- facing neutral idle with empty hands at sides, snow particles in air, premium UE5 key art, single character only, one front-facing view, no turnaround sheet, no weapon, no HUD.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the crystal dragon using the source video: it holds jaws closed and wings half-spread, stalks forward with heavy footfalls, spreads its wings wider, lifts its head slightly, and leans forward … reference_image_prompt: Horizontal 16:9 low-angle boss reveal shot, First-frame pose and camera locked — match the driver video frame 0 exactly …

instruction_prompt: Animate the crystal dragon using the source video: it holds jaws closed and wings half-spread, stalks forward with heavy footfalls, spreads its wings wider, lifts its head slightly, and leans forward menacingly. Match audio from the source video.
            reference_image_prompt: Horizontal 16:9 low-angle boss reveal shot, First-frame pose and camera locked — match the driver video frame 0 exactly (same subject facing, limb positions, joint angles, and shot scale): Low-angle full-body front-facing menacing idle, chin slightly down, limbs held wide at sides not raised overhead, feet planted, weight forward, torso facing camera. Raid boss crystal dragon skin, teal crystalline scales with frost fractures and refracted light shards, ice horn crown with chipped tips, glowing cyan chest core, wings half-spread with translucent membrane veins and dangling icicles, jaws closed, frozen cavern arena with icicle stalactites, blue mist floor fog, and cracked ice pillars, low-angle boss reveal framing, premium MMO cinematic graphics, no HUD.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the crystal knight in iridescent plate armor using the source video: she holds a crouch with her fist on the ground, pushes up to stand, snaps her right fist to her chest with her left arm bac… reference_image_prompt: Horizontal 16:9 low-angle full-body shot on a stormy floating skybridge landing pad above cloud sea, Low-angle full-body…

instruction_prompt: Animate the crystal knight in iridescent plate armor using the source video: she holds a crouch with her fist on the ground, pushes up to stand, snaps her right fist to her chest with her left arm back, and lifts her chin slightly. Match audio from the source video.
            reference_image_prompt: Horizontal 16:9 low-angle full-body shot on a stormy floating skybridge landing pad above cloud sea, Low-angle full-body front-facing crouch on landing pad, both knees bent, right fist touching ground, left arm back for balance, chin down, feet visible on metal grating. Original fictional battle-pass crystal knight skin, not based on any existing franchise. Iridescent plate armor with violet crystal pauldrons and gold trim, braided crimson hair with rain beads, rune-etched gauntlet on right fist touching grating, stormy skybridge with lightning-lit cloud void below, spinning teal holographic spawn ring on wet metal grating, distant floating citadel spires in bokeh, low-angle full-body front-facing crouch with fist on ground, premium UE5 cinematic render, no HUD, one character only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the stylized gacha arcanist heroine in the indigo velvet robes using the source video: she speaks to camera with natural lip sync throughout, opens her right palm briefly, raises her right fis… reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a violet crystal gacha summon hall with floating banner light orbs, Vertical ch…

instruction_prompt: Animate the stylized gacha arcanist heroine in the indigo velvet robes using the source video: she speaks to camera with natural lip sync throughout, opens her right palm briefly, raises her right fist to her chest while talking, then nods with a small smile. Match lip sync and audio from the source video.
            reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a violet crystal gacha summon hall with floating banner light orbs, Vertical chest-up front-facing toward camera, both hands empty and relaxed at sides, mouth slightly open mid-sentence for lip sync, shoulders squared, direct eye contact. Original fictional gacha arcanist heroine skin, not based on any existing franchise. Stylized cel-shaded mobile RPG heroine with large expressive anime eyes, smooth porcelain skin, deep indigo velvet robe dress with fitted high-neck bodice and gold celestial embroidery, silver-white hair with star-shaped hairpin and neat tight side braid, closed spell tome on gold chain at hip, violet crystal archways with suspended holographic banner cards and soft particle motes, warm gold key light from above, chest-up front-facing with mouth slightly open as if speaking to camera, high-saturation premium mobile gacha game cinematic render, no UI text, one character only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the rogue in the teal jacket using the source video: she keeps both hands clasped at her chest, pulls them apart, forms a dazzled grin, leans forward slightly, and nods once. Match audio from … reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a neon gacha summon chamber with holographic loot cards, Vertical chest-up fron…

instruction_prompt: Animate the rogue in the teal jacket using the source video: she keeps both hands clasped at her chest, pulls them apart, forms a dazzled grin, leans forward slightly, and nods once. Match audio from the source video.
            reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a neon gacha summon chamber with holographic loot cards, Vertical chest-up front-facing, both hands clasped at center chest, shoulders slightly forward, eyes wide with anticipation, mouth closed. Original fictional gacha rogue skin, not based on any existing franchise. Teal leather jacket with magenta circuit embroidery, silver undercut with holographic ear cuff, floating translucent loot cards with gold rarity frames orbiting shoulders, violet neon summon arch with particle sparks and chrome floor reflections, chest-up front-facing with both hands clasped at chest, high-saturation mobile game cinematic render, no UI text, one character only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the ranger in copper scale mail and antler pauldrons using the source video: she holds a fist at her chest, takes two steps forward, raises her fist higher, and holds with a subtle shoulder fl… reference_image_prompt: Vertical 9:16 wide medium-long full-body shot on a castle rampart at golden hour, generous headroom and footroom with vi…

instruction_prompt: Animate the ranger in copper scale mail and antler pauldrons using the source video: she holds a fist at her chest, takes two steps forward, raises her fist higher, and holds with a subtle shoulder flourish. Match audio from the source video.
            reference_image_prompt: Vertical 9:16 wide medium-long full-body shot on a castle rampart at golden hour, generous headroom and footroom with visible rampart stones ahead of the feet for forward stride, First-frame pose and camera locked — match the driver video frame 0 exactly (same subject facing, limb positions, joint angles, and shot scale): Eye-level full-body front-facing on rampart stones, right fist at chest, left arm extended back, feet planted shoulder-width, cape caught in breeze. Original fictional season-pass ranger skin, not based on any existing franchise. Copper scale mail with antler pauldrons, forest-green wool cape with gold trim, braided auburn hair with leather circlet, castle rampart with stone merlons, rippling crimson banner flags, golden sunset sky and distant misty valley armies blurred below, wide medium-long full-body front-facing on rampart with character centered occupying roughly 70% of frame height, empty space above head and below boots, premium mobile RPG cinematic render, no UI text, one character only.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the oil-painting noblewoman in the emerald velvet gown using the source video: she turns sharply to her left, brings both hands to her mouth in a gasp, inhales with raised shoulders, then hold… reference_image_prompt: Horizontal 16:9 medium full-body three-quarter shot in a candlelit baroque hall, First-frame pose and camera locked — ma…

instruction_prompt: Animate the oil-painting noblewoman in the emerald velvet gown using the source video: she turns sharply to her left, brings both hands to her mouth in a gasp, inhales with raised shoulders, then holds a shocked expression. Match lip sync and audio from the source video.
            reference_image_prompt: Horizontal 16:9 medium full-body three-quarter shot in a candlelit baroque hall, First-frame pose and camera locked — match the driver video frame 0 exactly (same subject facing, limb positions, joint angles, and shot scale): Medium full-body three-quarter toward camera, both hands clasped at chest, mouth slightly open, feet planted. Original fictional oil-painting portrait of a noblewoman come to life, solo subject centered, visible canvas texture and cracked varnish, emerald velvet gown with pearl choker, powdered updo with beauty mark, empty hall with marble floor and velvet drapes and landscape paintings with no human figures, medium full-body three-quarter toward camera, hands clasped at chest, one painted noblewoman only, no triptych, no split screen, no collage, single continuous frame.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the upright capybara in the olive cardigan using the source video: it keeps both paws on the mug, snaps its head back to face camera with raised eyebrows, leans back slightly with widened eyes… reference_image_prompt: Horizontal 16:9 medium chest-up straight-on shot in a rain-streaked corner cafe booth, First-frame pose and camera locke…

instruction_prompt: Animate the upright capybara in the olive cardigan using the source video: it keeps both paws on the mug, snaps its head back to face camera with raised eyebrows, leans back slightly with widened eyes, then exhales with a small half-smirk. Match lip sync and audio from the source video.
            reference_image_prompt: Horizontal 16:9 medium chest-up straight-on shot in a rain-streaked corner cafe booth, First-frame pose and camera locked — match the driver video frame 0 exactly (same subject facing, limb positions, joint angles, and shot scale): Medium chest-up straight-on toward camera, head turned to camera-right looking off-frame, both hands wrapped around a white ceramic mug at chest height, neutral closed-mouth expression. Cinematic photorealistic capybara standing upright on two hind legs with balanced humanoid proportions like a cafe regular, wearing a fitted olive wool cardigan over cream henley, small round wire-frame glasses, rain-streaked window behind with blurred amber streetlights and passing umbrella silhouettes, warm Edison bulb catchlights, wooden booth table edge visible, medium chest-up straight-on, both paws on mug, one anthropomorphic capybara only.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the upright bulldog in the charcoal three-piece suit using the source video: it keeps both arms open wide, takes two confident steps forward, raises one arm straight up with the other on its h… reference_image_prompt: Horizontal 16:9 wide eye-level full-body shot on a foggy Thames riverside promenade at dawn, First-frame pose and camera…

instruction_prompt: Animate the upright bulldog in the charcoal three-piece suit using the source video: it keeps both arms open wide, takes two confident steps forward, raises one arm straight up with the other on its hip, then lowers its arms and settles. Match lip sync and audio from the source video.
            reference_image_prompt: Horizontal 16:9 wide eye-level full-body shot on a foggy Thames riverside promenade at dawn, First-frame pose and camera locked — match the driver video frame 0 exactly (same subject facing, limb positions, joint angles, and shot scale): Wide eye-level full-body front-facing, both arms extended open at shoulder height, feet planted shoulder-width apart, torso facing camera, neutral confident expression. Cinematic photorealistic English bulldog standing upright on two hind legs with balanced humanoid proportions like a business presenter, wearing a fitted charcoal three-piece suit with burgundy tie and pocket square, polished brown oxford shoes, leather messenger bag strap across shoulder, solo subject centered on empty foggy Thames promenade with wet cobblestones and blurred Parliament silhouette across the water, soft blue dawn light and river mist, wide eye-level full-body front-facing, feet visible, both arms open at shoulder height, one anthropomorphic bulldog only, single continuous frame.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Animate the photorealistic golden retriever in the emerald blazer behind the desk using the source video: its mouth and jaw follow the driver's speech with accurate lip sync throughout, speaking the l… reference_image_prompt: Horizontal 16:9 locked static eye-level chest-up webcam shot, photorealistic golden retriever seated behind an office de…

instruction_prompt: Animate the photorealistic golden retriever in the emerald blazer behind the desk using the source video: its mouth and jaw follow the driver's speech with accurate lip sync throughout, speaking the line with the same timing and emphasis as the human founder. Front paws stay hidden under the desk. Small head nod and serious half-smile. Match lip sync and audio from the source video exactly.
            reference_image_prompt: Horizontal 16:9 locked static eye-level chest-up webcam shot, photorealistic golden retriever seated behind an office desk facing camera, First-frame pose and camera locked — match the driver video frame 0 exactly (same subject facing, limb positions, joint angles, and shot scale): Chest-up behind desk facing camera, photorealistic golden retriever seated like a founder, mouth open mid-speech for lip sync, front paws tucked under the desk out of sight, direct eye contact, fixed static webcam framing. Ultra-photorealistic golden retriever dog sitting upright on an office chair behind an oak desk like a SaaS founder on a webcam, natural canine anatomy and fur texture, wearing a real-sized emerald green blazer fitted over shoulders, cream collar visible, warm amber eyes with calm direct contact, oak desk top with closed MacBook and coffee mug in foreground, front paws completely hidden underneath the desk below the frame, no paws on desk surface, terracotta accent wall, fiddle- leaf fig and brass banker lamp blurred behind, locked static chest-up webcam framing, mouth slightly open mid-speech as if talking to camera, natural daylight, one real dog only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the freckled woman in the lavender cardigan using the source video: she keeps both hands beside her ears touching the oversized vintage dangle earrings only—never bananas, fruit, or peel props… reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a vintage thrift-store fitting room with warm tungsten bulbs, Vertical chest-up…

instruction_prompt: Animate the freckled woman in the lavender cardigan using the source video: she keeps both hands beside her ears touching the oversized vintage dangle earrings only—never bananas, fruit, or peel props—puffs her cheeks into a duck-face pout, then breaks into a wide smile while holding the earring pose. Match audio from the source video.
            reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a vintage thrift-store fitting room with warm tungsten bulbs, Vertical chest-up front-facing, both hands raised beside each ear with fingertips lightly touching oversized vintage gold dangle earrings, neutral expression with mouth slightly open, eyes looking straight at camera. Real human photograph, not CGI, not cartoon, not animal. Photorealistic white woman early twenties, fair freckled skin with visible pores and natural blemishes, auburn hair in a messy low bun with loose strands, fitted lavender cardigan over cream turtleneck, small enamel mushroom pin on chest, large mismatched thrift-store dangle earrings with amber beads and tiny charms catching tungsten light, velvet curtain and vintage coat rack blurred behind, warm tungsten bulb catchlights and soft shadow falloff, vertical chest-up front-facing, both hands touching earrings at ears, no bananas, no fruit, no peel props anywhere in frame including background, documentary smartphone selfie realism, one woman only, single continuous frame.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the upright sloth in the lavender button-down using the source video: it rolls its eyes upward slowly, returns to a flat stare, lifts its shoulders into a shrug with palms up, then drops the s… reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a fluorescent open-plan office with glass partitions, Vertical chest-up front-f…

instruction_prompt: Animate the upright sloth in the lavender button-down using the source video: it rolls its eyes upward slowly, returns to a flat stare, lifts its shoulders into a shrug with palms up, then drops the shrug into an exasperated half-smile. Match lip sync and audio from the source video.
            reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a fluorescent open-plan office with glass partitions, Vertical chest-up front-facing, both arms relaxed at sides, neutral closed-mouth expression, eyes looking straight at camera. Cinematic photorealistic three-toed sloth standing upright on two legs like an office worker, wearing a fitted lavender button-down with rolled sleeves and lanyard badge, soft grey-brown fur, dark eye patches, blurred standing desks and monitor glow behind glass walls, cool overhead fluorescent with warm monitor rim light, vertical chest-up front-facing, both claws folded at sides like arms, one anthropomorphic sloth only.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the woman in the sage sweater and tortoiseshell glasses using the source video: she already holds the plate at chest height with both hands from the first frame, keeps both hands firmly on the… reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a bright home living room with gray sofa and white wall paneling blurred behind…

instruction_prompt: Animate the woman in the sage sweater and tortoiseshell glasses using the source video: she already holds the plate at chest height with both hands from the first frame, keeps both hands firmly on the plate with the freshly baked brownies throughout and never sets it down, sniffs with eyes closed and pursed lips, opens her eyes into a warm smile, and lifts the plate slightly toward camera. Match audio from the source video.
            reference_image_prompt: Vertical 9:16 eye-level chest-up shot in a bright home living room with gray sofa and white wall paneling blurred behind, Vertical chest-up front-facing, both hands already holding one white ceramic plate lifted at chest height with palms supporting the underside and fingers curled around the rim, several freshly baked chocolate brownies with crackly tops and fudgy centers clearly visible on that plate, head tilted slightly down toward the plate, nose almost touching the brownies, eyes closed, lips pursed mid-sniff. Real human photograph, not CGI, not cartoon, not animal. Photorealistic Latina woman early thirties, olive skin, dark wavy hair in a loose ponytail, sage-green sweater, round tortoiseshell glasses, natural skin texture with visible pores, soft window light from the side, plate and brownies fully visible in frame, documentary smartphone realism, no pancakes, no asparagus, no vegetables, no green food, no table or counter under the plate, no empty hands, one plate only, one woman only, no diptych, no split screen, no collage, single continuous frame.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Animate the upright ferret in the yellow windbreaker and rain boots using the source video: it bounces in a criss-cross leg pattern with opposite arm swings, widens the arm swing slightly, then holds … reference_image_prompt: Vertical 9:16 eye-level full-body shot on an empty subway platform with yellow safety line, Vertical full-body front-fac…

instruction_prompt: Animate the upright ferret in the yellow windbreaker and rain boots using the source video: it bounces in a criss-cross leg pattern with opposite arm swings, widens the arm swing slightly, then holds mid-bounce with a small grin. Match audio from the source video.
            reference_image_prompt: Vertical 9:16 eye-level full-body shot on an empty subway platform with yellow safety line, Vertical full-body front-facing, feet together, both arms relaxed at sides, neutral closed-mouth expression, shoulders squared toward camera. Cinematic photorealistic ferret standing upright on two legs like a commuter, wearing matte black rain boots and a cropped yellow windbreaker over grey hoodie, sleek brown fur with cream chest patch, small enamel transit pin on collar, tiled platform wall with fluorescent overhead panels and warm tunnel glow at frame edge, vertical full-body front-facing, feet visible in boots, both arms at sides, one anthropomorphic ferret only, single continuous frame.

Integration

P-Video-Animate uses the same Pruna prediction API as P-Video. Upload video and one image, then poll or use sync headers as with other video models.

Tip

For more information on how to use the API, see the API Reference.

API Endpoint: Base URL: https://api.pruna.ai/v1/predictions

Authentication

-H 'apikey: YOUR_API_KEY'

Step 1: Upload source video and reference image

curl -X POST "https://api.pruna.ai/v1/files" \
  -H "apikey: YOUR_API_KEY" \
  -F "content=@/path/to/source.mp4"

curl -X POST "https://api.pruna.ai/v1/files" \
  -H "apikey: YOUR_API_KEY" \
  -F "content=@/path/to/reference.jpg"

Use the returned file URLs as video and the first entry in images.

Step 2: Create generation request

Animate mode (synchronous)

curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-animate' \
-H 'Try-Sync: true' \
-d '{
  "input": {
    "video": "https://api.pruna.ai/v1/files/file-video123",
    "images": [
      "https://api.pruna.ai/v1/files/file-portrait"
    ],
    "resolution": "720p",
    "fps": 24,
    "save_audio": true,
    "seed": 42
  }
}'

Configuration

Required parameters

Parameter	Type	Description
video	file/string	Source RGB video (`.mp4`). Motion, timing, and camera source.
images	file[] / string[]	Reference image(s). Animate uses the first image only.

Optional parameters

Parameter	Type	Default	Description
instruction_prompt	string	`""`	Usually left empty for animate.
resolution	string	`1080p`	Target megapixel budget: `720p` ≈ 1 MP, `1080p` ≈ 2 MP (aspect ratio preserved).
fps	integer	`24`	Frames per second of the output video.
save_audio	boolean	`true`	Save the video with audio.
ignore_audio	boolean	`false`	Ignore source audio for prompt conditioning and return a silent output video.
disable_safety_checker	boolean	`false`	Disable safety checker for generated videos (platform UI may still enforce checks).
seed	integer	random	Random seed. Leave blank for random.
no_op	boolean	`false`	Health check mode — returns status without inference.

Supported option values

resolution: 720p, 1080p.

Argument recommendations

Use these patterns for consistent quality:

video: prefer stable exposure, minimal motion blur, and clear visibility of the motion you want to transfer.
images: high-resolution, well-lit, front-facing still matching the intended framing; only the first image is used.
resolution: iterate in 720p, then rerun finals in 1080p.
fps: match source footage when possible; default 24 is fine for most web delivery.
save_audio / ignore_audio: keep save_audio: true for dialogue-driven clips; set ignore_audio: true when you only need motion without sound.
seed: set for reproducible A/B tests; change one variable at a time.
disable_safety_checker: leave default unless your workflow includes explicit moderation.