P-Video-Replace

P-Video-Replace replaces characters in an existing video using reference images and prompt-guided mappings.

Given a source video and one or more reference stills, the model replaces characters in the footage with your reference identities—output keeps the atmosphere of the video (background, blocking, camera, lighting) while swapping who appears on screen. Motion, acting, timing, and scene structure follow the driver. Use instruction_prompt to describe who replaces whom when multiple people are on screen.

It is optimized for:

  • Top visual quality

  • Most efficient inference

    • Speed: 3.58 s generation time per 1 s of video

    • Price: $0.03 and $0.06 per 1 s of 720p and 1080p video

Not sure how this differs from P-Video-Animate, P-Video, or P-Video-Avatar? See P-Video-Animate vs. P-Video-Replace and Choosing the right video model below.

Note

When using P-Video-Replace, respect the copyright of videos and images you use as input and of the video you generate.

Pricing:

Resolution

Price

720p

$0.03 per second of output video

1080p

$0.06 per second of output video

Tip

Test it in the P-Video Playground.

Prompt formula

Fast pass

Reuse a P-Video-Animate driver + matching still—good for first swaps and timing checks.

[p-image still] [video] [instruction_prompt]
catalog still from the animate row, same row's driver clip, one line: who → reference image 1 + keep motion/audio

Locked-in

Spell out who replaces whom, scene placement, and audio sync for repeatable runs.

[p-image still] [video] [instruction_prompt]
single subject, pose, wardrobe, lens + aspect spelled out, same driver clip every run, name driver subject + ref identity + match lip sync and audio
Fast pass — driver ↔ output compare; animate-catalog driver + still with a short instruction_prompt.
Locked-in — driver ↔ output compare; detailed still + instruction_prompt with explicit lip-sync and keep lines.
  1. Still / reference image — The API uses the first entry in images. reference_image_prompt is not a request field; generate the still with P-Image (see Domain Use Cases there), then upload the file as images[0]. In our docs examples, each still is the matching P-Video-Animate catalog frame.

    • Examples: “Photorealistic woman, 9:16 chest-up, soft window light, single subject”, “Stop-motion clay sailor, medium full-body, gray studio stool”

  2. video — Required driver clip. Motion, acting, timing, lip sync, and camera movement come from this file. Pair with the animate row’s {slug}_driver.mp4 when reusing catalog examples.

    • Examples: a winning UGC take, avatar output, or the driver from a P-Video-Animate example row.

  3. instruction_prompt — Names who in the driver to replace with whom from reference image 1. Keep it to one short sentence plus a keep line (motion, audio, camera; add lip sync for dialogue drivers).

    • Fast pass: “Replace the person in the source video with {character label} from reference image 1. Keep motion, audio, and camera from the source video.”

    • Locked-in: “Replace the live-action man on the gray studio stool with the stop-motion clay sailor from reference image 1. Keep lip sync, motion, audio, and camera from the source video.”

Slot

Fast pass (enough to run)

Locked-in (stronger control)

Still prompt (P-Image → images[0])

Reuse the animate-catalog still for that row, or one line: who, outfit, basic light.

Single subject explicit; pose, wardrobe, expression, light direction, set, lens + aspect—stable identity across reruns.

video

Animate-catalog driver for the motion you want to keep.

Same clip every run when comparing stills or prompt tweaks.

instruction_prompt

One line: driver subject → reference image 1 + keep motion/audio/camera.

Placement + identity mapping + sync line; call out props, blocking, and match lip sync and audio from the source video.

Note

Output size vs. driver size: resolution (720p / 1080p) sets the output megapixel budget and aspect ratio follows the driver. A 1080p replace may render at a higher pixel size than a 720p driver uploaded as video—that is expected. Side-by-side compare clips in our gallery are normalized to the driver’s frame size (crop-to-fill, no letterboxing).

Tip

For comprehensive video prompting (motion, framing, atmosphere), see the Video Generation guide.

How does it differ from other models?

The current market includes general video editing/modification models that can perform broad video transformations. P-Video-Replace is designed specifically for character replacement workflows: replacing people in an existing video while preserving the original motion, camera movement, lighting, background, and scene structure.

Benchmark numbers below are directional and may vary depending on resolution, clip length, settings, provider, queue time, and test date.

P-Video-Animate vs. P-Video-Replace

Both models take a source video and reference image(s), but they are built for different workflows—not interchangeable substitutes.

P-Video-Animate

P-Video-Replace

What it does

Animates one image using motion, timing, and camera movement from a driver clip.

Replaces the character(s) in a video with the character(s) from reference stills.

What atmosphere you keep

The image’s atmosphere—look, lighting, wardrobe, and world of the still drive the output.

The video’s atmosphere—background, blocking, camera, and scene of the footage drive the output.

When to use it

You have an approved hero still and want it to perform like an existing take.

You have finished footage and want different people in the same shot.

Rule of thumb: Choose Animate when the still defines the world; choose Replace when the clip defines the world.

Choosing the right video model

Pruna ships four performance video models. They share the same prediction API, but each solves a different production problem. P-Video-Replace uses the p-video-replace model (Model: p-video-replace header) and requires an existing source video.

P-Video

P-Video-Avatar

P-Video-Animate

Replace (this page)

One-line job

Generate new footage from prompts

Speak from one still (script or audio)

Retarget one still with clip motion

Swap characters in existing footage

You start with

Text prompt (+ optional image refs)

Portrait still + voice_script or audio

Source video + one still

Source video + 1–4 identity stills

You keep from the source

N/A (new scene)

Aspect ratio of the still

Motion, timing, camera movement, and optionally audio

Camera, timing, blocking, background

Typical ask

“Make a 10 s product ad in this style.”

“This spokesperson says this line in French.”

“Animate this catalog still using our winning ad take.”

“Put our creator in this UGC b-roll.”

Quick decision guide

  • No source video yet → use P-Video to create the plate, or P-Video-Avatar if you only need a talking head from a still.

  • Footage exists and the hero still should move like the driverP-Video-Animate (only the first entry in images is used). See P-Video-Animate vs. P-Video-Replace.

  • Footage exists and you need different people in the same shotP-Video-Replace (Model: p-video-replace; use instruction_prompt when multiple people are on screen).

Tip

Common pipelines: Generate stills with P-ImageP-Video-Avatar for new spokesperson clips → P-Video-Replace to drop talent into b-roll → P-Video-Animate to apply a hero still to motion from an avatar or ad clip.

Speed and throughput

Metric

P-Video-Replace (720p benchmark)

Generation time per 1 s of output

3.58 s

Cost (720p)

$0.03/s of output video

Cost (1080p)

$0.06/s of output video

Key features

P-Video-Replace fits the same Pruna API patterns as P-Video and P-Video-Avatar:

UGC ad variations

Scale winning creatives by swapping in new creators, customers, or personas.

Viral meme remixes

Refresh trending clips with custom characters, avatars, or branded personas.

Movie scene recasting

Replace actors or characters with uploaded avatars, selfies, or character images.

Game cinematic variations

Personalize trailers or cutscenes with player avatars, skins, heroes, or custom characters.

Educational videos

Localize or personalize training videos by replacing speakers, instructors, or role-based characters.

Fast compared to existing replace models

Optimized for production pipelines that need turnaround without sacrificing usable quality on typical footage.

Multi-character swap

Upload a video and 1–4 reference stills with an optional instruction_prompt (Model: p-video-replace header).

Scene and blocking preservation

Keeps the driver clip’s camera, timing, layout, and background while swapping visible characters.

Reliable everyday motion

Strong on normal movement and slow, controlled action—walking, talking heads, presenters, product demos.

Audio-aware output

Control source audio with save_audio and ignore_audio (see Configuration).

Practical constraints

  • Output length follows the source video duration (within platform max length).

  • Output aspect ratio follows the source video.

  • 1080p output can exceed the driver’s native pixel dimensions; preview at driver resolution when comparing before/after.

  • Very fast action, heavy occlusion, or extreme camera motion may reduce consistency.

  • Use a clean, well-lit reference still when possible.

  • Supports up to four references when multiple characters are on screen.

Examples

One tab per Hugging Face folder (ugc_ads, film_casting, gaming, meme_remixes), with side-by-side cards showing the reference image (image), driver (video), a driver ↔ output compare clip, resolution, and copy-ready instruction_prompt text. Create the reference still with P-Image (see Domain Use Cases there for tone and structure), then pass it as the first entry in images. Full assets live on prompt_guide/p-video-replace.

Integration

P-Video-Replace uses the same Pruna prediction API as P-Video. Upload video and images, set Model: p-video-replace, then poll or use sync headers as with other video models.

Tip

For more information on how to use the API, see the API Reference.

API Endpoint

Base URL: https://api.pruna.ai/v1/predictions

Authentication

-H 'apikey: YOUR_API_KEY'

Step 1: Upload source video and reference image

curl -X POST "https://api.pruna.ai/v1/files" \
  -H "apikey: YOUR_API_KEY" \
  -F "content=@/path/to/source.mp4"

curl -X POST "https://api.pruna.ai/v1/files" \
  -H "apikey: YOUR_API_KEY" \
  -F "content=@/path/to/reference.jpg"

Use the returned file URLs as video and entries in images.

Step 2: Create generation request

Replace mode (asynchronous)

curl -X POST 'https://api.pruna.ai/v1/predictions' \
-H 'Content-Type: application/json' \
-H 'apikey: YOUR_API_KEY' \
-H 'Model: p-video-replace' \
-d '{
  "input": {
    "video": "https://api.pruna.ai/v1/files/file-driver123",
    "images": [
      "https://api.pruna.ai/v1/files/file-still-a"
    ],
    "instruction_prompt": "Replace the person in the source video with the clay sailor (medium full-body) from reference image 1. Keep lip sync, motion, audio, and camera from the source video.",
    "resolution": "1080p",
    "save_audio": true
  }
}'

Configuration

Required parameters

Parameter

Type

Description

video

file/string

Source RGB video (.mp4). Motion, timing, and camera source.

images

file[] / string[]

Reference image(s). Replace: 1–4 identity references.

Optional parameters

Parameter

Type

Default

Description

mode

string

Do not send in input for replace—use the Model: p-video-replace header instead.

instruction_prompt

string

""

Further instruction on how to place people from reference images into the scene.

resolution

string

1080p

Target megapixel budget: 720p ≈ 1 MP, 1080p ≈ 2 MP (aspect ratio preserved).

fps

integer

24

Frames per second of the output video.

save_audio

boolean

true

Save the video with audio.

ignore_audio

boolean

false

Ignore source audio for prompt conditioning and return a silent output video.

disable_safety_checker

boolean

false

Disable safety checker for generated videos (platform UI may still enforce checks).

seed

integer

random

Random seed. Leave blank for random.

no_op

boolean

false

Health check mode — returns status without inference.

Supported option values

  • resolution: 720p, 1080p.

Argument recommendations

Use these patterns for consistent quality:

  • Model header: p-video-replace (required; do not send mode in input).

  • video: prefer stable exposure, minimal motion blur, and clear visibility of subjects you want to replace.

  • images: high-resolution, well-lit faces or full-body shots matching the intended framing; supports up to four references.

  • instruction_prompt: name who in the driver maps to reference image 1; mention wardrobe or props when identity drifts.

  • resolution: iterate in 720p, then rerun finals in 1080p.

  • fps: match source footage when possible; default 24 is fine for most web delivery.

  • save_audio / ignore_audio: keep save_audio: true for dialogue-driven clips; set ignore_audio: true when you only need motion without sound.

  • seed: set for reproducible A/B tests; change one variable at a time.

  • disable_safety_checker: leave default unless your workflow includes explicit moderation.