P-Video-Replace

P-Video-Replace replaces characters in an existing video using reference images and prompt-guided mappings.

Given a source video and one or more reference stills, the model replaces characters in the footage with your reference identities—output keeps the atmosphere of the video (background, blocking, camera, lighting) while swapping who appears on screen. Motion, acting, timing, and scene structure follow the driver. Use instruction_prompt to describe who replaces whom when multiple people are on screen.

It is optimized for:

Top visual quality
Most efficient inference
- Speed: 3.58 s generation time per 1 s of video
- Price: $0.03 and $0.06 per 1 s of 720p and 1080p video

Not sure how this differs from P-Video-Animate, P-Video, or P-Video-Avatar? See P-Video-Animate vs. P-Video-Replace and Choosing the right video model below.

Note

When using P-Video-Replace, respect the copyright of videos and images you use as input and of the video you generate.

Pricing:

Resolution	Price
720p	$0.03 per second of output video
1080p	$0.06 per second of output video

Tip

Test it in the P-Video Playground.

Prompt formula

Fast pass

Reuse a P-Video-Animate driver + matching still—good for first swaps and timing checks.

[p-image still] [video] [instruction_prompt]

catalog still from the animate row, same row's driver clip, one line: who → reference image 1 + keep motion/audio

Locked-in

Spell out who replaces whom, scene placement, and audio sync for repeatable runs.

[p-image still] [video] [instruction_prompt]

single subject, pose, wardrobe, lens + aspect spelled out, same driver clip every run, name driver subject + ref identity + match lip sync and audio

Fast pass — driver ↔ output compare; animate-catalog driver + still with a short instruction_prompt.

Locked-in — driver ↔ output compare; detailed still + instruction_prompt with explicit lip-sync and keep lines.

Still / reference image — The API uses the first entry in images. reference_image_prompt is not a request field; generate the still with P-Image (see Domain Use Cases there), then upload the file as images[0]. In our docs examples, each still is the matching P-Video-Animate catalog frame.
- Examples: “Photorealistic woman, 9:16 chest-up, soft window light, single subject”, “Stop-motion clay sailor, medium full-body, gray studio stool”
video — Required driver clip. Motion, acting, timing, lip sync, and camera movement come from this file. Pair with the animate row’s {slug}_driver.mp4 when reusing catalog examples.
- Examples: a winning UGC take, avatar output, or the driver from a P-Video-Animate example row.
instruction_prompt — Names who in the driver to replace with whom from reference image 1. Keep it to one short sentence plus a keep line (motion, audio, camera; add lip sync for dialogue drivers).
- Fast pass: “Replace the person in the source video with {character label} from reference image 1. Keep motion, audio, and camera from the source video.”
- Locked-in: “Replace the live-action man on the gray studio stool with the stop-motion clay sailor from reference image 1. Keep lip sync, motion, audio, and camera from the source video.”

Slot	Fast pass (enough to run)	Locked-in (stronger control)
Still prompt (P-Image → `images[0]`)	Reuse the animate-catalog still for that row, or one line: who, outfit, basic light.	Single subject explicit; pose, wardrobe, expression, light direction, set, lens + aspect—stable identity across reruns.
`video`	Animate-catalog driver for the motion you want to keep.	Same clip every run when comparing stills or prompt tweaks.
`instruction_prompt`	One line: driver subject → reference image 1 + keep motion/audio/camera.	Placement + identity mapping + sync line; call out props, blocking, and match lip sync and audio from the source video.

Note

Output size vs. driver size: resolution (720p / 1080p) sets the output megapixel budget and aspect ratio follows the driver. A 1080p replace may render at a higher pixel size than a 720p driver uploaded as video—that is expected. Side-by-side compare clips in our gallery are normalized to the driver’s frame size (crop-to-fill, no letterboxing).

Tip

For comprehensive video prompting (motion, framing, atmosphere), see the Video Generation guide.

How does it differ from other models?

The current market includes general video editing/modification models that can perform broad video transformations. P-Video-Replace is designed specifically for character replacement workflows: replacing people in an existing video while preserving the original motion, camera movement, lighting, background, and scene structure.

Benchmark numbers below are directional and may vary depending on resolution, clip length, settings, provider, queue time, and test date.

P-Video-Animate vs. P-Video-Replace

Both models take a source video and reference image(s), but they are built for different workflows—not interchangeable substitutes.

	P-Video-Animate	P-Video-Replace
What it does	Animates one image using motion, timing, and camera movement from a driver clip.	Replaces the character(s) in a video with the character(s) from reference stills.
What atmosphere you keep	The image’s atmosphere—look, lighting, wardrobe, and world of the still drive the output.	The video’s atmosphere—background, blocking, camera, and scene of the footage drive the output.
When to use it	You have an approved hero still and want it to perform like an existing take.	You have finished footage and want different people in the same shot.

Rule of thumb: Choose Animate when the still defines the world; choose Replace when the clip defines the world.

Choosing the right video model

Pruna ships four performance video models. They share the same prediction API, but each solves a different production problem. P-Video-Replace uses the p-video-replace model (Model: p-video-replace header) and requires an existing source video.

	P-Video	P-Video-Avatar	P-Video-Animate	Replace (this page)
One-line job	Generate new footage from prompts	Speak from one still (script or audio)	Retarget one still with clip motion	Swap characters in existing footage
You start with	Text prompt (+ optional image refs)	Portrait still + `voice_script` or `audio`	Source video + one still	Source video + 1–4 identity stills
You keep from the source	N/A (new scene)	Aspect ratio of the still	Motion, timing, camera movement, and optionally audio	Camera, timing, blocking, background
Typical ask	“Make a 10 s product ad in this style.”	“This spokesperson says this line in French.”	“Animate this catalog still using our winning ad take.”	“Put our creator in this UGC b-roll.”

Quick decision guide

No source video yet → use P-Video to create the plate, or P-Video-Avatar if you only need a talking head from a still.
Footage exists and the hero still should move like the driver → P-Video-Animate (only the first entry in images is used). See P-Video-Animate vs. P-Video-Replace.
Footage exists and you need different people in the same shot → P-Video-Replace (Model: p-video-replace; use instruction_prompt when multiple people are on screen).

Tip

Common pipelines: Generate stills with P-Image → P-Video-Avatar for new spokesperson clips → P-Video-Replace to drop talent into b-roll → P-Video-Animate to apply a hero still to motion from an avatar or ad clip.

Speed and throughput

Metric	P-Video-Replace (720p benchmark)
Generation time per 1 s of output	3.58 s
Cost (720p)	$0.03/s of output video
Cost (1080p)	$0.06/s of output video

Key features

P-Video-Replace fits the same Pruna API patterns as P-Video and P-Video-Avatar:

UGC ad variations: Scale winning creatives by swapping in new creators, customers, or personas.
Viral meme remixes: Refresh trending clips with custom characters, avatars, or branded personas.
Movie scene recasting: Replace actors or characters with uploaded avatars, selfies, or character images.
Game cinematic variations: Personalize trailers or cutscenes with player avatars, skins, heroes, or custom characters.
Educational videos: Localize or personalize training videos by replacing speakers, instructors, or role-based characters.
Fast compared to existing replace models: Optimized for production pipelines that need turnaround without sacrificing usable quality on typical footage.
Multi-character swap: Upload a video and 1–4 reference stills with an optional instruction_prompt (Model: p-video-replace header).
Scene and blocking preservation: Keeps the driver clip’s camera, timing, layout, and background while swapping visible characters.
Reliable everyday motion: Strong on normal movement and slow, controlled action—walking, talking heads, presenters, product demos.
Audio-aware output: Control source audio with save_audio and ignore_audio (see Configuration).

Practical constraints

Output length follows the source video duration (within platform max length).
Output aspect ratio follows the source video.
1080p output can exceed the driver’s native pixel dimensions; preview at driver resolution when comparing before/after.
Very fast action, heavy occlusion, or extreme camera motion may reduce consistency.
Use a clean, well-lit reference still when possible.
Supports up to four references when multiple characters are on screen.

Examples

One tab per Hugging Face folder (ugc_ads, film_casting, gaming, meme_remixes), with side-by-side cards showing the reference image (image), driver (video), a driver ↔ output compare clip, resolution, and copy-ready instruction_prompt text. Create the reference still with P-Image (see Domain Use Cases there for tone and structure), then pass it as the first entry in images. Full assets live on prompt_guide/p-video-replace.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the woman in the yellow apron at the kitchen counter in the source video with the East Asian man in navy apron holding the teal meal kit box from reference image 1. The output character must be the man from reference image 1. Fill the full frame edge to edge with no black bars or letterboxing. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the woman in the yellow apron at the kitchen counter in the source video with the East Asian man in navy apron holding the teal meal kit box from reference image 1. The output character must be the man from reference image 1. Fill the full frame edge to edge with no black bars or letterboxing. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with 2D fitness mascot still (3/4 chest-up) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with 2D fitness mascot still (3/4 chest-up) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with Black athlete still (straight-on chest-up) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with Black athlete still (straight-on chest-up) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with South Asian creator still (over-shoulder) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with South Asian creator still (over-shoulder) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the red-haired woman in the bedroom source video with the Filipina woman in rust orange linen robe from reference image 1. The output character must be the Filipina woman from reference image 1. Reference image 1 is identity only. The amber glass serum dropper bottle in the output must be the exact original bottle from the source video and reference image 2—same glass shape, amber color, cap, label, grip, and lift toward camera around 2 seconds. Do not invent a new bottle. Keep the bedroom background, lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the red-haired woman in the bedroom source video with the Filipina woman in rust orange linen robe from reference image 1. The output character must be the Filipina woman from reference image 1. Reference image 1 is identity only. The amber glass serum dropper bottle in the output must be the exact original bottle from the source video and reference image 2—same glass shape, amber color, cap, label, grip, and lift toward camera around 2 seconds. Do not invent a new bottle. Keep the bedroom background, lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with 3D cel-shaded mascot still (face macro) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with 3D cel-shaded mascot still (face macro) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with Middle Eastern student still (profile) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with Middle Eastern student still (profile) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with redhead creator still (straight-on vanity) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with redhead creator still (straight-on vanity) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the man with curly hair in the gray sleeveless hoodie at the neon gaming desk in the source video with the anime-style streamer with teal hair in the violet hoodie and magenta headset from reference image 1. The output character must be the illustrated anime character from reference image 1. Do not change the chair, desk, monitors, or background. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the man with curly hair in the gray sleeveless hoodie at the neon gaming desk in the source video with the anime-style streamer with teal hair in the violet hoodie and magenta headset from reference image 1. The output character must be the illustrated anime character from reference image 1. Do not change the chair, desk, monitors, or background. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Replace the person in the source video with animated heroine (3/4 chest-up) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with animated heroine (3/4 chest-up) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Replace the live-action man on the gray studio stool with the stop-motion clay sailor from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the live-action man on the gray studio stool with the stop-motion clay sailor from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Replace the person in the source video with watercolor knight (straight-on chest-up) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with watercolor knight (straight-on chest-up) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Replace the person in the source video with frost-mage commander skin (3/4 over map) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with frost-mage commander skin (3/4 over map) from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Replace the person in the source video with ice paladin (low hero) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with ice paladin (low hero) from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Replace the person in the source video with crystal dragon (worm's eye) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with crystal dragon (worm's eye) from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Replace the person in the source video with crystal knight skin (low hero) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with crystal knight skin (low hero) from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with violet arcanist (chest-up) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with violet arcanist (chest-up) from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with cyber rogue skin (chest-up) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with cyber rogue skin (chest-up) from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the woman in the coral athletic jacket on the white runway studio in the source video with the red-haired man in green medieval armor and cape from reference image 1. The output performer must be the man from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the woman in the coral athletic jacket on the white runway studio in the source video with the red-haired man in green medieval armor and cape from reference image 1. The output performer must be the man from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Replace the person in the source video with oil-painting noble still (3/4 full-body) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with oil-painting noble still (3/4 full-body) from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Replace the person in the source video with capybara barista still (chest-up) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with capybara barista still (chest-up) from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 16:9

Example prompts

instruction_prompt: Replace the person in the source video with photoreal dog founder still from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with photoreal dog founder still from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with freckled thrift shopper still (chest-up) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with freckled thrift shopper still (chest-up) from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with home cook still (chest-up) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with home cook still (chest-up) from reference image 1. Keep motion, audio, and camera from the source video.

Reference image image

Driver video

Driver ↔ output

1080p, 9:16

Example prompts

instruction_prompt: Replace the person in the source video with rain-boot ferret still (full-body) from reference image 1. Keep motion, audio, and camera from the source video.

instruction_prompt: Replace the person in the source video with rain-boot ferret still (full-body) from reference image 1. Keep motion, audio, and camera from the source video.

Integration

P-Video-Replace uses the same Pruna prediction API as P-Video. Upload video and images, set Model: p-video-replace, then poll or use sync headers as with other video models.

Tip

For more information on how to use the API, see the API Reference.

API Endpoint: Base URL: https://api.pruna.ai/v1/predictions

Authentication

-H 'apikey: YOUR_API_KEY'

Step 1: Upload source video and reference image

curl -X POST "https://api.pruna.ai/v1/files" \
  -H "apikey: YOUR_API_KEY" \
  -F "content=@/path/to/source.mp4"

curl -X POST "https://api.pruna.ai/v1/files" \
  -H "apikey: YOUR_API_KEY" \
  -F "content=@/path/to/reference.jpg"

Use the returned file URLs as video and entries in images.

Step 2: Create generation request

Replace mode (asynchronous)

curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-replace' \
-d '{
  "input": {
    "video": "https://api.pruna.ai/v1/files/file-driver123",
    "images": [
      "https://api.pruna.ai/v1/files/file-still-a"
    ],
    "instruction_prompt": "Replace the person in the source video with the clay sailor (medium full-body) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.",
    "resolution": "1080p",
    "save_audio": true
  }
}'

Configuration

Required parameters

Parameter	Type	Description
video	file/string	Source RGB video (`.mp4`). Motion, timing, and camera source.
images	file[] / string[]	Reference image(s). Replace: 1–4 identity references.

Optional parameters

Parameter	Type	Default	Description
mode	string	—	Do not send in `input` for replace—use the `Model: p-video-replace` header instead.
instruction_prompt	string	`""`	Further instruction on how to place people from reference images into the scene.
resolution	string	`1080p`	Target megapixel budget: `720p` ≈ 1 MP, `1080p` ≈ 2 MP (aspect ratio preserved).
fps	integer	`24`	Frames per second of the output video.
save_audio	boolean	`true`	Save the video with audio.
ignore_audio	boolean	`false`	Ignore source audio for prompt conditioning and return a silent output video.
disable_safety_checker	boolean	`false`	Disable safety checker for generated videos (platform UI may still enforce checks).
seed	integer	random	Random seed. Leave blank for random.
no_op	boolean	`false`	Health check mode — returns status without inference.

Supported option values

resolution: 720p, 1080p.

Argument recommendations

Use these patterns for consistent quality:

Model header: p-video-replace (required; do not send mode in input).
video: prefer stable exposure, minimal motion blur, and clear visibility of subjects you want to replace.
images: high-resolution, well-lit faces or full-body shots matching the intended framing; supports up to four references.
instruction_prompt: name who in the driver maps to reference image 1; mention wardrobe or props when identity drifts.
resolution: iterate in 720p, then rerun finals in 1080p.
fps: match source footage when possible; default 24 is fine for most web delivery.
save_audio / ignore_audio: keep save_audio: true for dialogue-driven clips; set ignore_audio: true when you only need motion without sound.
seed: set for reproducible A/B tests; change one variable at a time.
disable_safety_checker: leave default unless your workflow includes explicit moderation.