Full guide

This comprehensive guide teaches you how to craft compelling prompts for AI video generation. Master the art of video prompt engineering to bring motion, timing, and narrative flow to your creative visions.

Note

All example videos in this guide were generated using the Pruna-optimized WAN video models on Replicate:

What is a video generation prompt?

A “video prompt serves as your creative blueprint for motion” - it’s the textual instruction that guides AI models to generate specific videos. Think of it as directing a cinematographer who can capture anything you can describe.

Effective video prompt engineering involves “strategically describing movement, timing, and visual flow” that communicates your vision clearly across multiple frames. The precision of your language directly influences the quality and coherence of the generated output.

Prompting principles for video generation

Master these fundamental principles to create compelling prompts that generate better, more coherent video results.

✅ DO

❌ DON’T

Use descriptive, direct language

“Camera slowly panning across a serene lake at sunset”

Use command-style instructions

“Please create a video of a lake at sunset”

Focus on motion and action

“Waves crashing against rocky shore with dramatic motion”

Describe static scenes

“A rocky shore with waves”

Be specific about timing

“Slow motion of water droplets splashing”

Ignore pacing

“Water splashing”

Include camera movements

“Tracking shot following the cyclist down the winding road”

Omit camera work details

“Cyclist on a road”

Use dynamic verbs

“Running, flying, swimming, dancing”

Use passive language

“A person who is moving”

Specify temporal elements

“Gradually, slowly, quickly, building to crescendo”

Leave timing ambiguous

“Scene changes”

Include atmospheric details

“Misty morning with soft diffused light”

Ignore mood and atmosphere

“Morning scene”

Tip

Successful video prompts follow a simple formula: Subject(s) + Action(s) + Scene(s) + (Camera Movement(s) + Lighting(s) + Style). The first three are essential; the last three are optional enhancements.

Creating a video generation prompt

Every well-crafted video prompt contains three essential components:

  1. Subject(s): What or who are the main focus of the video

  2. Action(s): What the subject(s) are doing

  3. Scene(s): Where the action(s) are taking place

Optional enhancements that add to the narrative and visual appeal:

  1. Camera Movement(s): How the scene(s) are filmed

  2. Lighting(s): Lighting that enhances atmosphere and mood

  3. Style: The visual tone and aesthetic

[Subject 1] [Action 1] [Scene] [Camera Movement] [Lighting] [Style]
Cinematic shot of a knitted purple prune character, dancing joyfully through, a large pink and purple room, camera starting in front, then circling and ending behind the character showing a race car, warm golden hour lighting creating soft illumination, cinematic animation style

Step 1: Define the subject(s)

Begin by identifying the main focus of your video - whether it’s a person, animal, object, or scene. Be as descriptive as possible to guide the AI model toward generating the exact subject you envision.

Subject Specification Guidelines:

  • Human Subjects: Include age range, gender, clothing style, physical features, facial expressions, and body postures

  • Animal Subjects: Specify breed/variety, size, coloration, behavior, and habitat

  • Objects: Detail materials, dimensions, condition, and placement

What to Include:

  • Primary subject: The main focus of the video

  • Additional subjects: Supporting subjects that enhance the narrative

  • Appearance: Physical features, hairstyle, clothing, accessories

  • Body postures: Stance and positioning

  • Physical characteristics: Size, proportions, distinctive features


"Cinematic shot of a knitted purple prune character"

Step 2: Describe the action(s)

What is the subject doing? This is the core of your prompt, as it drives the video’s storyline. The action describes the motion, movement, and activity that occurs throughout the video.

Action Specification Guidelines:

  • Motion Verbs: Use dynamic verbs like “running”, “flowing”, “transforming”, “dancing”, “soaring”, “colliding”

  • Tempo and Pacing: Specify timing with terms like “slow motion”, “fast-paced”, “building to crescendo”, “gradually”

  • Direction and Flow: Indicate movement direction like “from left to right”, “circling around”, “approaching”

  • Intensity: Describe energy level with terms like “energetic”, “peaceful”, “dramatic”, “intense”

  • Continuity: Define whether the action is “continuous”, “sequential”, “transitioning”, or “building”

What to Include:

  • Primary action: The main movement or activity

  • Additional actions: Multiple sequential or simultaneous actions (e.g., “wakes up, then puts on glasses, then drinks coffee”)

  • Motion quality: How the movement feels (smooth, jerky, fluid, explosive)

  • Temporal elements: Speed, duration, and progression of action


"[...] dancing joyfully through the room"

Step 3: Establish the scene(s)

Where is the action taking place? The scene provides the environmental context that sets the stage for your video’s narrative. This includes the foreground, background, and spatial elements that define where the action unfolds.

Scene Specification Guidelines:

  • Location Type: Specify the environment like “urban street”, “mountain peak”, “underwater cave”, “space station”

  • Spatial Elements: Describe the setting including architecture, landscape, furniture, or structural features

  • Environmental Conditions: Include weather, atmospheric effects, time of day, seasonal elements

  • Contextual Details: Add ambient elements like crowds, furniture, natural features, or architectural details

  • Depth and Scale: Define the scope with terms like “vast landscape”, “intimate space”, “panoramic view”

What to Include:

  • Foreground elements: Objects, people, or features in the immediate action space

  • Background elements: Distant scenery, architecture, natural features that provide context

  • Environmental mood: The overall feeling of the space (cozy, vast, industrial, natural)

  • Spatial relationships: How elements relate to each other in the scene


"[...] a large pink and purple room with puddles on the floor"

Step 4: Specify camera movement(s) (optional)

How is the scene filmed? Camera movement defines the visual perspective and how the viewer experiences the action. These techniques guide the audience’s attention and create dynamic visual narratives.

Camera Movement Guidelines:

  • Tracking Techniques: “following”, “tracking”, “pursuing” - camera moves with the subject

  • Orbital Movements: “circling around”, “orbiting”, “rotating view” - camera moves around the subject in a circular path

  • Zoom Effects: “zooming in/out”, “approaching”, “backing away” - camera distance changes

  • Panning and Tilting: “panning left/right”, “tilting up/down” - camera rotates horizontally or vertically

  • Aerial Perspectives: “bird’s eye view”, “overhead shot”, “flying camera” - elevated camera positions

  • Handheld Effects: “shaky”, “handheld”, “documentary style” - simulating camera operator movement

What to Include:

  • Movement type: “following”, “circling”, “flying”, “pushing in”

  • Direction and path: Where the camera moves (up, down, around, through)

  • Speed and pacing: How quickly the camera moves (“slowly”, “rapidly”, “gradually”)

  • Transition effects: “starting”, “ending”, “smoothly transitioning” between positions

  • Multiple techniques: You can combine multiple camera movements for complex shots


"[...] camera starting in front, then circling and ending behind the character"

Step 5: Define lighting(s) (optional)

What lighting conditions enhance the mood and atmosphere? Lighting is a powerful tool for setting emotional tone, creating depth, and directing attention in your video. It affects how viewers perceive and feel about the scene.

Lighting Specification Guidelines:

  • Natural Lighting: “golden hour”, “blue hour”, “daylight”, “sunset”, “dawn”, “natural window lighting”

  • Artificial Lighting: “streetlights”, “neon glow”, “spotlight”, “studio lighting”, “LED strips”

  • Lighting Character: “soft”, “harsh”, “diffused”, “dramatic”, “warm”, “cool”, “natural”

  • Lighting Direction: “front-lit”, “backlit”, “side-lit”, “top-down”, “rim lighting”

  • Shadow Effects: “dramatic shadows”, “soft shadows”, “high contrast”, “shadow play”, “dancing shadows”

  • Atmospheric Effects: “flickering”, “shifting”, “glowing”, “luminescent”, “ethereal illumination”

What to Include:

  • Light source: “golden hour”, “candlelight”, “neon signs”, “streetlamps”

  • Lighting quality: “soft”, “dramatic”, “harsh”, “diffused”, “warm”, “cool”

  • Shadow characteristics: “dramatic shadows”, “soft shadows”, “long shadows”

  • Mood and atmosphere: “mysterious”, “warm”, “intimate”, “dramatic”, “peaceful”

  • Time and setting: “morning light”, “midnight”, “sunset colors”, “candlelit”


"[...] warm golden hour lighting creating soft illumination"

Step 6: Establish style (optional)

What is the visual tone and aesthetic? Style encompasses the artistic approach, emotional tone, and overall visual language of your video. It determines how the content looks and feels to viewers.

Style Specification Guidelines:

  • Animation Styles: “anime”, “3D rendered”, “cel-shaded”, “motion graphics”, “stop motion”

  • Realism Levels: “photorealistic”, “hyper-realistic”, “lifelike”, “documentary style”

  • Artistic Approaches: “oil painting”, “watercolor”, “sketch”, “graphic novel”, “comic book style”

  • Visual Aesthetics: “cyberpunk”, “vintage”, “modern minimalist”, “retro”, “futuristic”

  • Cinematic Styles: “cinematic”, “Hollywood blockbuster”, “indie film”, “documentary”, “music video”

  • Emotional Tones: “whimsical”, “dramatic”, “peaceful”, “energetic”, “mysterious”, “inspirational”

  • Cultural Styles: “American comics”, “Japanese animation”, “European art house”, “documentary film”

What to Include:

  • Visual medium: “3D animation”, “2D illustration”, “live-action”, “painted”

  • Artistic movement: “impressionist”, “expressionist”, “minimalist”, “surrealist”

  • Genre aesthetics: “sci-fi”, “fantasy”, “noir”, “cyberpunk”, “retro”

  • Emotional atmosphere: “serene”, “dramatic”, “whimsical”, “intense”, “calming”

  • Technical quality: “cinematic”, “professional”, “studio quality”, “film grain”


"[...] cinematic animation style"

Video generation prompt categories

Understanding how specific words and phrases impact your generated videos is essential for crafting effective prompts. Each term you include shapes the visual output in predictable ways. This section explains not just what terms to use, but “what temporal effects they create” and “how they influence motion and pacing”.

Visual style vocabulary

Visual style terms control the artistic medium, rendering technique, and overall aesthetic approach of your generated videos. These keywords transform how subjects appear in motion and what mood the video conveys.

"3D animated knitted purple prune character running from left to right across urban rooftops stopping in the middle of the screen to wave towards the camera, smooth motion capture, whimsical and energetic mood, fast-paced action"

Category

Visual Effect

Three-dimensional rendering

“3D rendered”, “CGI animation”, and “computer-generated” establishes three-dimensional appearance, “polygonal models” adds geometric structure

Motion capture quality

“smooth motion capture” and “realistic animation” creates natural movement, “professional rigging” ensures proper articulation, “fluid motion” adds smoothness

Dynamic camera work

“camera following with dynamic angles” provides tracking shots, “orbital camera” creates circular motion, “flying camera” establishes fluid movement

Textured materials

“realistic textures”, “procedural materials”, and “shader effects” adds surface detail, “reflective surfaces” creates glossy appearances

Lighting systems

“global illumination”, “ray-traced lighting”, and “volumetric lighting” creates realistic illumination, “dramatic shadows” adds depth

Action sequences

“parkour movements”, “acrobatic stunts”, and “dynamic action” establishes energetic motion, “fast-paced” creates intensity, “building to climax” adds progression

Architectural visualization

“modern buildings”, “urban environments”, and “architectural details” establishes spatial contexts, “flying through interior” creates immersive motion

Subject matter vocabulary

Subject matter terms specify what appears in your video - the environments, activities, objects, and contexts that create your visual narrative. These terms define the content and setting of your generated videos, working alongside visual style to create complete compositions.

"Martial artist performing complex kata in dojo, camera circling around the practitioner, dramatic lighting creating shadows, intense and focused atmosphere, rhythmic timing"

Category

Visual Effect

Combat movements

“martial arts”, “fighting”, and “combat sequences” creates violent action, “karate moves” establishes specific techniques, “combat choreography” adds professional quality

High-energy motion

“fast-paced action”, “dynamic movement”, and “rapid sequences” establishes energetic intensity, “explosive” creates sudden bursts

Stunt sequences

“acrobatic stunts”, “parkour”, and “extreme sports” adds athletic performance, “daredevil” creates risk, “precise movements” adds control

Dramatic lighting

“dramatic shadows”, “high contrast lighting”, and “intense illumination” creates mood, “backlighting” adds silhouette effects

Camera techniques

“camera circling”, “tracking shots”, and “dynamic angles” creates motion, “close-up to wide” adds variety

Atmospheric intensity

“tense”, “focused”, and “dramatic” establishes mood, “building to climax” creates progression

Video format vocabulary

Video format terms control the structure, length, camera technique, and complexity of your generated videos. These keywords determine how your video unfolds and how the content is organized.

"Knitted purple prune character stretching in sunbeam, slow motion, warm lighting, peaceful mood"

Category

Visual Effect

Concise content

“brief”, “simple”, and “focused” creates clarity, “to the point” establishes efficiency

Single focus

“one subject”, “single action”, and “simple scene” creates focus, “unified content” establishes simplicity

Clear action

“straightforward motion”, “direct movement”, and “simple activity” creates clarity, “defined action” establishes purpose

Essential elements

“core components”, “key details”, and “essential information” creates completeness, “necessary elements” establishes sufficiency

Quick execution

“fast generation”, “efficient”, and “rapid” creates speed, “streamlined” establishes quickness

Immediate impact

“instant appeal”, “clear message”, and “direct communication” creates effectiveness

Simple framing

“basic camera work”, “straightforward framing”, and “uncomplicated” creates accessibility

Advanced prompting strategies

Master these sophisticated techniques to refine your video generation and achieve more precise results.

Chaining videos

Video chaining allows you to extend your videos by using the last frame of a generated video as the starting image for a new image-to-video model. This technique enables you to create seamless longer sequences and connect multiple video segments into a continuous narrative.

How video chaining works:

  1. Extract the last frame: After generating your initial video, extract the final frame

  2. Use as image input: Feed this last frame into an image-to-video model as the starting frame

  3. Continue the sequence: The model will generate new motion extending from that last frame

You can also incorporate image generation or image editing models into this workflow to make controlled changes to the scene between video segments:

  • Image generation: Use text-to-image models to create entirely new starting frames that match your narrative

  • Image editing: Use image editing models to make specific modifications to the last frame before feeding it to the image-to-video model

This gives you precise control over scene transitions, allowing you to gradually transform scenes, introduce new elements, or make targeted changes to specific parts of the frame.

Example workflow:

  1. Generate initial video: “A character walking through a forest”

  2. Extract the last frame showing the character at the forest edge

  3. Edit the frame to add a doorway in the background

  4. Use the edited frame to generate next segment: “The character approaching and entering a magical doorway”

Cheaper and faster iteration

Iterating on video generation can be time-consuming and expensive. A more efficient workflow combines image-to-video models with image generation or image editing models, allowing you to rapidly refine your scene before committing to video generation.

Why this approach works:

  • Faster iteration: Generating and editing images is much faster than generating full videos

  • Lower costs: Image generation/editing operations cost significantly less than video generation

  • More control: You can precisely adjust composition, lighting, style, and elements before animating

  • Better results: Refining the base image leads to better starting frames for video generation

Example workflow:

  1. Generate initial image: Use a text-to-image model to create your starting scene

  2. Edit and refine: Use image editing models to make targeted adjustments

  3. Generate video: Once you have the perfect starting image, use it with an image-to-video model

Troubleshooting common issues

Even with well-crafted prompts, you may encounter issues. Here’s how to address common problems:

Problem

Solution

Check

Try

Video lacks motion

Add dynamic verbs and movement descriptors

Whether you described static scenes

Include “running”, “flowing”, “transforming”

Poor pacing control

Specify timing elements

If pacing feels off

Add “slow motion”, “fast-paced”, “building intensity”

Limited camera variety

Include camera movement details

Whether camera work is mentioned

Add “tracking”, “circling”, “panning”

Confusing transitions

Clarify scene progression

For multi-scene content

Specify “smooth transitions” and “connecting scenes”

Inconsistent quality

Add quality enhancement terms

Technical specifications

Include “cinematic”, “professional”, “high-quality”

Unclear focus

Simplify and focus on core elements

Too many conflicting elements

Reduce to essential components

Next steps