LoRA Training and Inference
This page explains what LoRAs are, how to prepare datasets for LoRA training, and how to use LoRA weights at LoRA inference. For full parameter lists, see P-Image and P-Image-Edit.
Two workflows (don’t mix them)
Workflow |
Use case |
Training |
Inference |
|---|---|---|---|
P-Image LoRA Training and Inference |
trigger word + prompt → image |
||
P-Image-Edit LoRA Training and Inference |
trigger word + input image (+ prompt) → edited image |
P-Image LoRA: train with p-image-trainer, run inference with p-image-lora. End-to-end notebook: P-Image LoRA: Training and Inference.
P-Image-Edit LoRA: train with p-image-edit-trainer, run inference with p-image-edit-lora. End-to-end notebook: P-Image-Edit LoRA: Training and Inference.
What are LoRAs?
LoRA (Low-Rank Adaptation) is a lightweight way to fine-tune diffusion models. Instead of retraining the full model, LoRA trains a small set of extra weights that are applied on top of the base model at inference time.
Benefits
Small files — LoRA weights are often a few dozen MB, not full model size.
Faster training — Less compute than full fine-tuning.
Swappable — Use different LoRAs on the same base model without reloading.
Workflow
LoRA training — Upload a dataset to the trainer endpoint (P-Image LoRA: p-image-trainer; P-Image-Edit LoRA: p-image-edit-trainer). The trainer produces LoRA weights (e.g. hosted on Hugging Face).
LoRA inference — Call the inference endpoint (P-Image LoRA: p-image-lora; P-Image-Edit LoRA: p-image-edit-lora) with
lora_weightsset to your weights URL. Optionally setlora_scaleto control strength (0–1+).
Text-to-image: dataset preparation
Use this format for P-Image LoRA training (p-image-trainer) — custom concepts, characters, products, or styles.
Note
You can find a properly formatted text-to-image dataset for stylized generation on Hugging Face here:
Folder structure
Your dataset must be a single folder (zipped for upload). Each image can have a matching caption file with the same base filename.
dataset/
├── image_001.jpg
├── image_001.txt
├── image_002.png
├── image_002.txt
├── image_003.webp
├── image_003.txt
Rules
Every image can have a
.txtcaption file (e.g.photo.txtforphoto.jpg). If captions are missing, setdefault_captionin the trainer or training will fail.Use at least 10 images but preferably a little more. Quality and diversity matter more than raw quantity.
Caption guidelines
Describe what you want the model to learn. Use clear, natural language; multi-line is allowed.
For a person, character, product, or specific visual concept, use a unique trigger word in every caption (e.g.
tok_my_conceptorsks_my_concent). Use it consistently so you can prompt the trained model with that same token later.
Dataset size recommendations
Use case |
Recommended images |
|---|---|
Person / character |
15–40 |
Product |
20–50 |
Style / aesthetic |
30–100 |
General concept |
50+ |
Image requirements
Minimum resolution 512×512; formats
.jpg,.png,.webp.Prefer varied angles, lighting, backgrounds, and poses. Avoid duplicates, watermarks, or low-resolution images.
What not to include
Nested folders, missing or mismatched caption files, reused captions, unlicensed or disallowed content.
Image editing: dataset preparation
Use this format for P-Image-Edit LoRA training (p-image-edit-trainer) — inpainting, instruction-based editing. Editing requires paired data: original image, edited target, and optionally a mask and instruction.
Note
You can find a properly formatted image editing datasets on Hugging Face here:
Folder structure
Each example is identified by a shared base filename.
dataset/
├── edit_001_input.png
├── edit_001_target.png
├── edit_001.txt
├── edit_002_input.jpg
├── edit_002_target.jpg
├── edit_002.txt
Optional (inpainting / masked editing)
dataset/
├── edit_003_input.png
├── edit_003_target.png
├── edit_003_mask.png
├── edit_003.txt
Naming rules
*_input(or*_start) → original image*_target(or*_end) → edited result (what the model should learn to produce)*_mask→ optional binary mask (white = editable area, black = frozen). Same resolution as input;.pngrecommended. If omitted, the model assumes global editing.*.txt→ edit instruction (one per pair). If any pair has no.txt, setdefault_captionor training will fail.
All files in a pair must share the same base name (e.g. edit_003).
Instruction guidelines
Describe only the change; do not restate the full image.
Use imperative edits (e.g. “replace the background with a snowy mountain landscape”).
One instruction per file; multi-line is allowed.
Dataset size recommendations
Use case |
Recommended pairs |
|---|---|
Simple edits (background, color) |
100–300 |
Inpainting / object replacement |
200–500 |
Instruction-following editing |
500+ |
Editing models require more data than text-to-image (concept) fine-tuning.
Image requirements
Resolution ≥ 512×512; formats
.jpg,.png,.webp.Input and target must have the same resolution, be pixel-aligned, and differ only in the edited regions.
Avoid misaligned pairs, unrelated style changes, or multiple edits per example.
What not to include
Missing input/target pairs, wrong filename suffixes, non-binary masks, captions instead of edit instructions, or unlicensed content.
Training time
With default settings (e.g. 1000 steps), p-image-edit-trainer typically takes about 30–45 minutes for ~100 image pairs. p-image-trainer (text-to-image) is faster for comparable data.
Using LoRA weights at inference
Both P-Image LoRA inference (p-image-lora) and P-Image-Edit LoRA inference (p-image-edit-lora) accept:
lora_weights — URL to your LoRA. Supports Hugging Face:
huggingface.co/<owner>/<model-name>[/<lora-weights-file.safetensors>]. Gated or private repos may require hf_api_token.lora_scale — How strongly to apply the LoRA (default 1; 0–1+). Lower values blend more with the base model.
For full parameter lists and Replicate links, see P-Image and P-Image-Edit.