P-Video-Avatar

P-Video-Avatar is Pruna’s performance model for speech-driven avatar video from a single image—spokesperson-style clips with strong lip sync and multilingual speech.

Designed for professional results, it combines fast turnaround, script or uploaded audio, voice and language control, and P-Image-compatible start frames, so you can keep one approved look across ads, support, education, and localized variants.

Note

Some visuals shown here are inspired by, derived from brand assets, or reminiscent of representative brands across various industries and have been adapted for demo purposes. When using P-Video-Avatar, respect the copyright of images you use as input and of the speech and video you generate.

Pricing:

Resolution	Price
720p	$0.025 per second of output video
1080p	$0.045 per second of output video

Tip

Test it now in the P-Video-Avatar Playground.

Prompt formula

Fast pass

Fewer constraints—good for first renders and timing checks.

[p-image start frame] [voice_script] [voice_prompt] [video_prompt]

one line: age, outfit, soft light, short lines, easy for TTS, warm / calm + pace in a few words, steady shot, small gestures

Locked-in

Explicit rules—same look and motion across runs and longer clips.

[p-image start frame] [voice_script] [voice_prompt] [video_prompt]

single subject, aspect, light direction, wardrobe & set spelled out, full script, localized, pauses that sound natural aloud, role + energy + what to avoid (hype, theatrical), fixed camera; gesture zone; background static or soft blur; no pan/zoom/handheld if sync slips

Fast pass — lighter prompts; good for first cuts and timing.

Locked-in — richer still + script; tighter voice_prompt and video_prompt (fixed camera, motion limits) for repeatable lip sync.

Still / first frame — The API takes an uploaded image (first frame). start_image_prompt is not a request field; generate the still with P-Image using a prompt like the badge row above (demographic, wardrobe, lighting, lens, framing—single subject), then upload the file as image.
- Examples (P-Image prompt style): “Professional woman in her 30s, medium close-up, office window light”, “single subject, 9:16, soft key, direct eye contact”
voice_script — What the avatar says, in the target language (or use uploaded audio instead).
- Examples: “Welcome—let’s connect your data in under two minutes.”, short lines for TTS.
voice_prompt — How it is said: tone, pace, energy, role—not the words of the script (listen to voice timbre in Voice audio samples).
- Examples: “warm support specialist, calm pace”, “no sales hype, clear consonants”
video_prompt — On-camera motion: face, shoulders, hands; background behavior; usually fixed camera for stable lip sync.
- Examples: “fixed eye-level shot, small hand gestures”, “soft office blur behind subject, no zoom”

Slot	Fast pass (enough to run)	Locked-in (stronger control)
Still prompt (P-Image → `image`)	One line: who, outfit, basic light; single subject implied.	Single subject explicit; age, skin, hair, wardrobe, expression, light direction, set, lens + aspect—stable identity; matches voice + locale intent.
`voice_script`	Short lines, one beat per sentence, target language.	Ear-tested for TTS; rewrite per locale (do not ship English in a foreign voice).
`voice_prompt`	Tone + pace in a few words.	Role (host, coach, support), register, what to avoid (hype, theatrical); must match the still persona.
`video_prompt`	Steady shot, small motion, simple background.	Fixed camera when lip sync drifts; repeatable gestures; background static or soft blur; no pan / zoom / handheld if the mouth or frame wobbles.

Tip

When you need strict speaking cadence and pronunciation, prefer uploaded audio over generated TTS.

Tip

For comprehensive video prompting (motion, framing, atmosphere), see the Video Generation guide.

Key Features

P-Video-Avatar fits the same Pruna API patterns as P-Video while specializing in talking-head generation:

Script or uploaded audio: Drive speech with voice_script and built-in voices, or upload audio for exact timing and pronunciation. Compare built-in voices in Voice audio samples.
Multilingual voices: Match voice_language and voice to your region; keep the same start-frame identity across locales. Preview every available voice in Voice audio samples.
Lip-sync-friendly motion: Use video_prompt to describe stable framing, gestures, and background—optimized for clear mouth motion.
720p and 1080p output: Choose resolution per asset; cost scales per second of output video (see Pricing).
P-Image-aligned start frames: Generate the image still with P-Image using the same prompt habits as the P-Image documentation.

Practical constraints

We recommend clips under 3 minutes for best consistency.
Output aspect ratio follows the input image.
Very long clips may show gradual consistency drift over time. This is a current diffusion-model limitation across the industry.

Horizontal and vertical strategy

p-video-avatar inherits aspect ratio from the input image:

Horizontal output: provide a landscape start frame (for example 16:9).
Vertical output: provide a portrait start frame (for example 9:16).

Generate landscape and portrait starts with P-Image so the first frame matches the aspect ratio you want in the final avatar clip (for example 16:9 for web hero cuts, 9:16 for social).

Identity + voice/language alignment

To keep examples realistic and coherent, each scenario aligns:

identity prompt (including gender and demographic descriptor),
voice gender (female/male voice),
voice_language (target locale/language),
frame format (horizontal/vertical input image).

Example alignment presets:

Use case	Identity prompt (image)	voice	voice_language	Gender alignment	Format
SaaS onboarding	Black woman, professional spokesperson, direct camera engagement	`Zephyr (Female)`	`English (US)`	female image + female voice	horizontal
EU founder update	White woman, founder-style delivery, social-first framing	`Kore (Female)`	`French`	female image + female voice	vertical
Product manager explainer	East Asian man, product walkthrough style, concise delivery	`Puck (Male)`	`Spanish`	male image + male voice	vertical
Support tutorial	Male support agent, reassuring tone, instructional style	`Charon (Male)`	`English (UK)`	male image + male voice	horizontal
Education short	Female educator, calm teaching posture, high clarity	`Aoede (Female)`	`Hindi`	female image + female voice	vertical

Examples

One tab per domain, with side-by-side cards: video (poster matches the still), voice and resolution line, and copy-ready prompts. The still text is a P-Image prompt (see Domain Use Cases there for tone and structure). Use it in P-Image to create the frame, then upload that file as image in the API. The label start_image_prompt in the copy blocks is documentation shorthand—it is not a request field.

Zephyr (Female), English (US), 720p, vertical

Example prompts

start_image_prompt: Photorealistic portrait of one South Asian woman in her mid-twenties, warm brown skin, long … voice_script: One headshot, three ad cuts—we shipped them same day, no second shoot. voice_prompt: Authentic creator testimonial, conversational, punchy without sounding … video_prompt: Fixed camera, steady framing; natural gestures and expression, soft day…

voice_script: One headshot, three ad cuts—we shipped them same day, no second shoot.
            voice_prompt: Authentic creator testimonial, conversational, punchy without sounding like a hard sell.
            video_prompt: Fixed camera, steady framing; natural gestures and expression, soft daylight.
            start_image_prompt: Photorealistic portrait of one South Asian woman in her mid-twenties, warm brown skin, long dark hair, simple heather-gray crewneck sweater; relaxed genuine smile, hands relaxed at waist or one open gesture at chest height; soft side window light, plain warm-white wall, edge of a leafy plant softly out of focus; vertical 9:16 medium close-up at eye level, shallow depth of field, natural color, soft shadows. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Kore (Female), English (US), 1080p, vertical

Example prompts

start_image_prompt: Photorealistic editorial portrait of one Black woman in her early thirties, rich deep brown sk… voice_script: Same portrait for our site hero, paid social, and founder clip—one face… voice_prompt: Polished premium retail voice, calm confidence, measured pacing. video_prompt: Fixed camera, soft natural light; easy shoulders, steady eye contact, s…

voice_script: Same portrait for our site hero, paid social, and founder clip—one face for the whole launch.
            voice_prompt: Polished premium retail voice, calm confidence, measured pacing.
            video_prompt: Fixed camera, soft natural light; easy shoulders, steady eye contact, subtle expression only.
            start_image_prompt: Photorealistic editorial portrait of one Black woman in her early thirties, rich deep brown skin, natural shoulder-length curls, camel linen blazer over ivory top; warm confident expression, direct friendly eye contact; medium close-up from mid-chest up, shoulders easy, arms and hands resting naturally out of lower frame; calm minimal interior, soft sage wall, tall window light from camera left, sheer curtain bokeh; vertical 9:16, soft diffused key, velvety skin rendering, high-end magazine campaign look. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Puck (Male), English (US), 720p, vertical

Example prompts

start_image_prompt: Photorealistic portrait of one Latino man in his late twenties, tan skin, short fade haircut, l… voice_script: We record once at the kitchen counter—the team swaps copy each week, sa… voice_prompt: Upbeat coach-influencer delivery, clear and friendly, no shouting. video_prompt: Fixed camera, bright key; strong eye line, natural upper-body motion on…

voice_script: We record once at the kitchen counter—the team swaps copy each week, same face.
            voice_prompt: Upbeat coach-influencer delivery, clear and friendly, no shouting.
            video_prompt: Fixed camera, bright key; strong eye line, natural upper-body motion only.
            start_image_prompt: Photorealistic portrait of one Latino man in his late twenties, tan skin, short fade haircut, light stubble, fitted tee in forest green; warm approachable stance mid-talk to camera; bright kitchen-lab hybrid counter with wooden bowl of citrus and greens softly blurred behind him, glass pitcher catchlight, rubber floor mat edge hinting home studio; hard motivated key with soft fill, vertical 9:16 nutrition and meal-kit influencer composition, crisp natural skin texture, energetic wholesome grade. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Puck (Male), English (US), 720p, horizontal

Example prompts

start_image_prompt: Photorealistic portrait of one Pacific Islander man in his mid-thirties, short black hair, slim … voice_script: Alerts still look off after reconnect? Open integration health, run a q… voice_prompt: Reassuring enterprise support tone, slow enough to follow, no jargon fo… video_prompt: Fixed camera at desk; calm hands and posture, office lighting, no camer…

voice_script: Alerts still look off after reconnect? Open integration health, run a quick integrity check, refresh—saves you opening a ticket.
            voice_prompt: Reassuring enterprise support tone, slow enough to follow, no jargon for its own sake.
            video_prompt: Fixed camera at desk; calm hands and posture, office lighting, no camera movement.
            start_image_prompt: Photorealistic portrait of one Pacific Islander man in his mid-thirties, short black hair, slim glasses, light blue oxford shirt sleeves rolled; calm reassuring expression; hands relaxed on a light wood desk; simple modern office with one softly blurred monitor shape behind him, pale wall, small desk plant; horizontal 16:9 desk-level framing, warm daylight-balanced office light, soft shadows. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Kore (Female), English (US), 720p, horizontal

Example prompts

start_image_prompt: Photorealistic portrait of one Afro-Latina woman in her early thirties, warm brown skin, should… voice_script: On your phone: care hub, linked records, tap sync—give it thirty second… voice_prompt: Calm, concise, mobile-first support; speak in short clauses. video_prompt: Fixed camera; small guiding hand motion, eye contact, bright key—no pan…

voice_script: On your phone: care hub, linked records, tap sync—give it thirty seconds on Wi-Fi or cell.
            voice_prompt: Calm, concise, mobile-first support; speak in short clauses.
            video_prompt: Fixed camera; small guiding hand motion, eye contact, bright key—no pan or zoom.
            start_image_prompt: Photorealistic portrait of one Afro-Latina woman in her early thirties, warm brown skin, shoulder-length hair in a low ponytail, linen blazer over soft tee; confident approachable smile; one hand slightly raised in a guiding gesture toward off-camera pocket device; bright clinic-modern interior with frosted glass, pale wood, blurred corridor depth suggesting healthcare portal context; horizontal 16:9 explainer framing at chest-up, balanced key and fill, crisp fabric detail. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Charon (Male), English (US), 720p, horizontal

Example prompts

start_image_prompt: Photorealistic portrait of one West African man in his early forties, neat beard, wire-frame gl… voice_script: First time here? Link the company profile, confirm who runs payroll, pr… voice_prompt: Helpdesk lead: friendly, methodical, confidence-building without talkin… video_prompt: Fixed camera; intentional hand cues, open posture—subject motion only.

voice_script: First time here? Link the company profile, confirm who runs payroll, preview one cycle—then invite your managers.
            voice_prompt: Helpdesk lead: friendly, methodical, confidence-building without talking down.
            video_prompt: Fixed camera; intentional hand cues, open posture—subject motion only.
            start_image_prompt: Photorealistic portrait of one West African man in his early forties, neat beard, wire-frame glasses, layered cardigan over collared shirt; thoughtful approachable posture, open palm mid-explainer; HR and payroll software onboarding suite: glass partitions, warm wood, monitors as soft abstract glow, potted snake plant, rolling chair silhouette; horizontal 16:9 instructional framing, soft cinematic office light, trustworthy warm-neutral palette. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Zephyr (Female), English (US), 720p, vertical

Example prompts

start_image_prompt: Photorealistic portrait of one Black British woman in her late twenties, medium-brown skin, nea… voice_script: Hey—I'm getting you into Northwind Grid: plug in a meter and you'll see… voice_prompt: Warm, plainspoken onboarding; crisp and easy to follow. video_prompt: Fixed camera, vertical crop; subtle shoulders and hands, bright office …

voice_script: Hey—I'm getting you into Northwind Grid: plug in a meter and you'll see live emissions in about two minutes.
            voice_prompt: Warm, plainspoken onboarding; crisp and easy to follow.
            video_prompt: Fixed camera, vertical crop; subtle shoulders and hands, bright office depth.
            start_image_prompt: Photorealistic portrait of one Black British woman in her late twenties, medium-brown skin, neat shoulder-length locs, mustard blazer over black top; welcoming articulate expression, chin slightly lifted toward lens; bright loft with living plant wall, birch plywood shelving with rolled blueprint tubes and a small wind turbine desk sculpture as abstract shapes; vertical 9:16 cleantech product launch energy, soft skylight key plus bounce fill, accurate skin tone rendition for HDR clarity. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Kore (Female), French, 720p, vertical

Example prompts

start_image_prompt: Photorealistic portrait of one North African French woman in her early thirties, warm tan skin, … voice_script: Bienvenue sur SuiteLodge. En quelques minutes vous branchez votre premi… voice_prompt: Professional French onboarding, clear articulation, approachable manage… video_prompt: Fixed camera, centered; light hand motion, stable vertical crop.

voice_script: Bienvenue sur SuiteLodge. En quelques minutes vous branchez votre première propriété—le tableau d'occupation s'affiche en direct.
            voice_prompt: Professional French onboarding, clear articulation, approachable manager tone.
            video_prompt: Fixed camera, centered; light hand motion, stable vertical crop.
            start_image_prompt: Photorealistic portrait of one North African French woman in her early thirties, warm tan skin, wavy dark hair, lightweight structured jacket in stone color, small sculptural earrings; lively professional expression; Haussmann-window daylight, velvet lobby chair blur behind her suggesting boutique hotel operations software; marble reception reflection soft bokeh; vertical 9:16 hospitality-tech launch, creamy highlight roll-off, shallow depth, crisp eye contact. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Puck (Male), Spanish, 720p, vertical

Example prompts

start_image_prompt: Photorealistic portrait of one mestizo Colombian man in his early thirties, warm tan skin, short … voice_script: Bienvenido a PuertoLane. En minutos conectas tu primera ruta y ves enví… voice_prompt: Warm Latin American Spanish, confident product voice, short sentences. video_prompt: Fixed camera; natural hands and shoulders, relaxed pacing.

voice_script: Bienvenido a PuertoLane. En minutos conectas tu primera ruta y ves envíos en vivo.
            voice_prompt: Warm Latin American Spanish, confident product voice, short sentences.
            video_prompt: Fixed camera; natural hands and shoulders, relaxed pacing.
            start_image_prompt: Photorealistic portrait of one mestizo Colombian man in his early thirties, warm tan skin, short groomed beard, navy knit polo; confident warm smile, shoulders squared; modern cross-border logistics office with warehouse windows as horizontal light bands, steel shelving blur, concrete planter with monstera, traffic dock warmth through glass; vertical 9:16 freight and customs platform vibe, saturated cinematic grade with realistic shadow density on darker skin. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Aoede (Female), English (US), 720p, horizontal

Example prompts

start_image_prompt: Photorealistic portrait of one Vietnamese American woman in her mid-thirties, medium tan skin, … voice_script: Pick a handoff trigger, tie it to your scorecard, and define a safe han… voice_prompt: Instructor energy: clear marking of steps, no filler, friendly expert. video_prompt: Fixed camera, warm key; open hands on beats, no camera movement.

voice_script: Pick a handoff trigger, tie it to your scorecard, and define a safe handoff for your unit by Friday—that's the whole module.
            voice_prompt: Instructor energy: clear marking of steps, no filler, friendly expert.
            video_prompt: Fixed camera, warm key; open hands on beats, no camera movement.
            start_image_prompt: Photorealistic portrait of one Vietnamese American woman in her mid-thirties, medium tan skin, neat bob haircut, soft blazer over blouse, pale lanyard at neckline; motivating instructor posture, open hands mid-concept; clinical training room with pale blue wall, examination curtain soft blur, sanitizer station catchlight, daylight-balanced teaching lighting; horizontal 16:9 healthcare compliance micro-module framing distinct from civic recap and manufacturing executive variants. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Iapetus (Male), English (US), 720p, horizontal

Example prompts

start_image_prompt: Photorealistic portrait of one East African man in his early forties, deep brown skin, short na… voice_script: Recap: open data, weekly dashboard, calibrate on residents. Before next… voice_prompt: Encouraging facilitator, crisp endings on sentences, pause-friendly. video_prompt: Fixed camera; measured posture and nods on each point—no pan or zoom.

voice_script: Recap: open data, weekly dashboard, calibrate on residents. Before next time, pick one neighborhood number you'll publish every Monday.
            voice_prompt: Encouraging facilitator, crisp endings on sentences, pause-friendly.
            video_prompt: Fixed camera; measured posture and nods on each point—no pan or zoom.
            start_image_prompt: Photorealistic portrait of one East African man in his early forties, deep brown skin, short natural hair, thin framed glasses, quarter-zip sweater over collared shirt; measured facilitator presence, slight lean-in; civic learning studio with abstract circular wall relief softly out of focus, chalk-toned pinboard as neutral cork texture, folding chairs as bokeh; cool-warm split light for serious recap tone; horizontal 16:9 public-sector data literacy recap distinct from clinical micro-lesson and manufacturing executive scenes. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Callirrhoe (Female), English (US), 720p, horizontal

Example prompts

start_image_prompt: Photorealistic portrait of one Indian woman in her forties, medium brown skin, small hoop earri… voice_script: Next quarter: pick one delivery target you actually care about, one own… voice_prompt: Decisive ops tone, plain language, no filler. video_prompt: Fixed camera; calm decisive hands, soft factory bokeh, premium key.

voice_script: Next quarter: pick one delivery target you actually care about, one owner, and fifteen minutes on the line every week—that's the loop.
            voice_prompt: Decisive ops tone, plain language, no filler.
            video_prompt: Fixed camera; calm decisive hands, soft factory bokeh, premium key.
            start_image_prompt: Photorealistic portrait of one Indian woman in her forties, medium brown skin, small hoop earrings, tailored navy blazer over soft top; composed decisive posture, clear hand gesture mid-sentence; bright executive meeting nook with large window, soft city skyline blur, light oak table edge; horizontal 16:9, natural daylight with soft fill, clean corporate portrait. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Fenrir (Male), English (US), 1080p, vertical

Example prompts

start_image_prompt: Single stylized male sci-fi commander character, medium-close portrait, vertical framing, float… voice_script: Three point two is live—new skirmish map, tighter matchmaking, bonus cr… voice_prompt: Authoritative in-character VO, punchy cadence, steady pacing. video_prompt: Fixed camera; wind moves coat and hair, stable stance, eyes to lens.

voice_script: Three point two is live—new skirmish map, tighter matchmaking, bonus credits on dailies.
            voice_prompt: Authoritative in-character VO, punchy cadence, steady pacing.
            video_prompt: Fixed camera; wind moves coat and hair, stable stance, eyes to lens.
            start_image_prompt: Single stylized male sci-fi commander character, medium-close portrait, vertical framing, floating rocky plateau over a glowing canyon at dusk, dramatic rim light and light dust in air, high-end real-time 3D character render, bold saturated palette, clean silhouette. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Leda (Female), English (US), 720p, vertical

Example prompts

start_image_prompt: Single stylized female life-sim avatar in pastel clothes, friendly wave, vertical portrait, simp… voice_script: Hey—ranked cups this weekend, dailies pay double, and we're testing par… voice_prompt: Warm, upbeat, clear sentences. video_prompt: Fixed camera; gentle wave and sway, expression-led, soft depth.

voice_script: Hey—ranked cups this weekend, dailies pay double, and we're testing party regions. Ping us what works.
            voice_prompt: Warm, upbeat, clear sentences.
            video_prompt: Fixed camera; gentle wave and sway, expression-led, soft depth.
            start_image_prompt: Single stylized female life-sim avatar in pastel clothes, friendly wave, vertical portrait, simple rolling hills and soft pastel sky, minimal environment, smooth casual-game 3D render, warm light, clean focal separation. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Orus (Male), English (US), 720p, vertical

Example prompts

start_image_prompt: Single stylized male wasteland raider in weathered leather and brass goggles, vertical portrait, … voice_script: Sunforge opens at reset—three nights of canyon raids, new relic track, … voice_prompt: Gravelly hype narrator, dry heat energy, tight pacing. video_prompt: Fixed camera; heat shimmer and dust in air, shoulder motion, steady eye…

voice_script: Sunforge opens at reset—three nights of canyon raids, new relic track, clan board goes live.
            voice_prompt: Gravelly hype narrator, dry heat energy, tight pacing.
            video_prompt: Fixed camera; heat shimmer and dust in air, shoulder motion, steady eyes.
            start_image_prompt: Single stylized male wasteland raider in weathered leather and brass goggles, vertical portrait, heat shimmer and amber dust, twin suns over rust-red canyon mesas, wind-whipped scarf motion, sci-fi desert survival game key art, harsh orange-teal grade, open dunes and sky. Single subject only, no extra people in frame. Strong composition, premium lighting, natural detail.

Integration

P-Video-Avatar uses the same Pruna prediction API as P-Video. You supply a portrait or spokesperson still (often from P-Image) as image, then drive speech with voice_script and voice fields—or override with audio for exact timing. For fast voice selection before calling the API, jump to Voice audio samples.

For the full p-video-avatar request and response reference, use P-Video-Avatar in the API guides.

Tip

For more information on how to use the API, see the API Reference.

API Endpoint: Base URL: https://api.pruna.ai/v1/predictions

Authentication

-H 'apikey: YOUR_API_KEY'

Step 1: Upload your avatar source image

curl -X POST "https://api.pruna.ai/v1/files" \
  -H "apikey: YOUR_API_KEY" \
  -F "content=@/path/to/portrait.jpg"

Use the returned file URL as image in generation requests.

Step 2: Create avatar generation request

Script + built-in TTS (asynchronous)

curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-avatar' \
-d '{
  "input": {
    "image": "https://api.pruna.ai/v1/files/file-abc123",
    "voice_script": "Hello and welcome to our product demo.",
    "voice": "Zephyr (Female)",
    "voice_language": "English (US)",
    "voice_prompt": "Warm, energetic, sales presentation tone.",
    "video_prompt": "The person speaks with subtle hand gestures and a dynamic office background.",
    "resolution": "720p"
  }
}'

Script + built-in TTS (synchronous)

curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-avatar' \
-H 'Try-Sync: true' \
-d '{
  "input": {
    "image": "https://api.pruna.ai/v1/files/file-abc123",
    "voice_script": "Welcome to your onboarding. Let us configure your first workflow.",
    "voice": "Puck (Male)",
    "resolution": "1080p",
    "seed": 42
  }
}'

Uploaded audio override

curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-avatar' \
-d '{
  "input": {
    "image": "https://api.pruna.ai/v1/files/file-abc123",
    "audio": "https://api.pruna.ai/v1/files/file-audio456",
    "voice_script": "This text is ignored when audio is provided.",
    "video_prompt": "Natural body-camera engagement, slight camera push-in."
  }
}'

Configuration

Required parameters

Parameter	Type	Description
image	file/string	Input image (first frame). Supports jpg, jpeg, png, webp.

You must also provide either voice_script or audio (or both, with audio taking priority).

Optional parameters

Parameter	Type	Default	Description
audio	file/string		Uploaded audio URL used to drive speech and timing.
voice	string	`"Zephyr (Female)"`	Voice used for generated speech. Browse examples in Voice audio samples.
voice_script	string	`""`	Script spoken when `audio` is not provided.
voice_prompt	string	`"Say the following."`	Speaking style instructions (tone, pacing, emotion). Use Voice audio samples to calibrate style against each voice.
voice_language	string	`"English (US)"`	Output language for generated speech. Pair language choice with a preview from Voice audio samples.
video_prompt	string	`"The person is talking."`	Prompt controlling body movement, framing behavior, and atmosphere.
resolution	string	`"720p"`	Output resolution. Allowed values: `720p`, `1080p`.
seed	integer	random	Random seed for reproducible generations.
disable_safety_filter	boolean	`true`	Disables prompt/image safety checks when true.
disable_prompt_upsampling	boolean	`false`	Skip prompt upsampling and pass raw prompt text to the model.

Supported option values

resolution: 720p, 1080p.
voice_language: English (US), English (UK), Spanish, French, German, Italian, Portuguese (Brazil), Japanese, Korean, Hindi.
voice: Zephyr (Female), Puck (Male), Charon (Male), Kore (Female), Fenrir (Male), Leda (Female), Orus (Male), Aoede (Female), Callirrhoe (Female), Autonoe (Female), Enceladus (Male), Iapetus (Male), Umbriel (Male), Algenib (Male), Despina (Female), Erinome (Female), Laomedeia (Female), Achernar (Female), Algieba (Male), Schedar (Male), Gacrux (Female), Pulcherrima (Female), Achird (Male), Zubenelgenubi (Male), Vindemiatrix (Female), Sadachbia (Male), Sadaltager (Male), Sulafat (Female), Alnilam (Male), Rasalgethi (Male).

Use Voice audio samples to audition voices before finalizing voice + voice_language.

Voice audio samples

Play all current built-in voices inline:

Voice	Player
Achernar (Female)
Achird (Male)
Algenib (Male)
Algieba (Male)
Alnilam (Male)
Aoede (Female)
Autonoe (Female)
Callirrhoe (Female)
Charon (Male)
Despina (Female)
Enceladus (Male)
Erinome (Female)
Fenrir (Male)
Gacrux (Female)
Iapetus (Male)
Kore (Female)
Laomedeia (Female)
Leda (Female)
Orus (Male)
Puck (Male)
Pulcherrima (Female)
Rasalgethi (Male)
Sadachbia (Male)
Sadaltager (Male)
Schedar (Male)
Sulafat (Female)
Umbriel (Male)
Vindemiatrix (Female)
Zephyr (Female)
Zubenelgenubi (Male)

Argument recommendations

Use these patterns for consistent quality:

image: the only first-frame input; use P-Image to create the still, then upload the file. Example bundles on this page label the still text start_image_prompt for readability—that name is not an API parameter.
seed: set when you need reproducible A/B variants; change only one variable at a time.
audio vs voice_script: prefer audio when exact timing/pronunciation is critical; otherwise use voice_script for speed and scale.
voice + voice_language: choose together and align persona with your start-frame identity (see Voice audio samples).
voice_prompt: keep to delivery style only (tone, speed, emotion), not content.
video_prompt: use for movement/framing/background behavior; avoid re-stating the script.
resolution: iterate in 720p, then rerun final assets in 1080p.
disable_prompt_upsampling: set true for strict prompt control and reproducibility; keep false when you want automatic prompt enhancement.
disable_safety_filter: keep default behavior unless you have an explicit moderated workflow for disabled filtering.