LoRA Training and Inference
This page explains what LoRAs are, how to prepare datasets for LoRA training, and how to use LoRA weights at LoRA inference. For full parameter lists, see P-Image and P-Image-Edit.
Two workflows (donβt mix them)
Workflow |
Use case |
Training |
Inference |
|---|---|---|---|
P-Image LoRA Training and Inference |
trigger word + prompt β image |
||
P-Image-Edit LoRA Training and Inference |
trigger word + input image (+ prompt) β edited image |
P-Image LoRA: train with p-image-trainer, run inference with p-image-lora. End-to-end notebook: P-Image LoRA: Training and Inference.
P-Image-Edit LoRA: train with p-image-edit-trainer, run inference with p-image-edit-lora. End-to-end notebook: P-Image-Edit LoRA: Training and Inference.
What are LoRAs?
LoRA (Low-Rank Adaptation) is a lightweight way to fine-tune diffusion models. Instead of retraining the full model, LoRA trains a small set of extra weights that are applied on top of the base model at inference time.
Benefits
Small files β LoRA weights are often a few dozen MB, not full model size.
Faster training β Less compute than full fine-tuning.
Swappable β Use different LoRAs on the same base model without reloading.
Workflow
LoRA training β Upload a dataset to the trainer endpoint (P-Image LoRA: p-image-trainer; P-Image-Edit LoRA: p-image-edit-trainer). The trainer produces LoRA weights (e.g. hosted on Hugging Face).
LoRA inference β Call the inference endpoint (P-Image LoRA: p-image-lora; P-Image-Edit LoRA: p-image-edit-lora) with
lora_weightsset to your weights URL. Optionally setlora_scaleto control strength (0β1+).
Text-to-image: dataset preparation
Use this format for P-Image LoRA training (p-image-trainer) β custom concepts, characters, products, or styles.
Note
You can find a properly formatted text-to-image dataset for stylized generation on Hugging Face here:
Folder structure
Your dataset must be a single folder (zipped for upload). Each image can have a matching caption file with the same base filename.
dataset/
βββ image_001.jpg
βββ image_001.txt
βββ image_002.png
βββ image_002.txt
βββ image_003.webp
βββ image_003.txt
Rules
Every image can have a
.txtcaption file (e.g.photo.txtforphoto.jpg). If captions are missing, setdefault_captionin the trainer or training will fail.Use at least 10 images but preferably a little more. Quality and diversity matter more than raw quantity.
Caption guidelines
Describe what you want the model to learn. Use clear, natural language; multi-line is allowed.
For a person, character, product, or specific visual concept, use a unique trigger word in every caption (e.g.
tok_my_conceptorsks_my_concent). Use it consistently so you can prompt the trained model with that same token later.
Dataset size recommendations
Use case |
Recommended images |
|---|---|
Person / character |
15β40 |
Product |
20β50 |
Style / aesthetic |
30β100 |
General concept |
50+ |
Image requirements
Minimum resolution 512Γ512; formats
.jpg,.png,.webp.Prefer varied angles, lighting, backgrounds, and poses. Avoid duplicates, watermarks, or low-resolution images.
What not to include
Nested folders, missing or mismatched caption files, reused captions, unlicensed or disallowed content.
Image editing: dataset preparation
Use this format for P-Image-Edit LoRA training (p-image-edit-trainer) β inpainting, instruction-based editing. Editing requires paired data: original image, edited target, and optionally a mask and instruction.
Note
You can find a properly formatted image editing datasets on Hugging Face here:
Folder structure
Each example is identified by a shared base filename.
dataset/
βββ edit_001_input.png
βββ edit_001_target.png
βββ edit_001.txt
βββ edit_002_input.jpg
βββ edit_002_target.jpg
βββ edit_002.txt
Optional (inpainting / masked editing)
dataset/
βββ edit_003_input.png
βββ edit_003_target.png
βββ edit_003_mask.png
βββ edit_003.txt
Naming rules
*_input(or*_start) β original image*_target(or*_end) β edited result (what the model should learn to produce)*_maskβ optional binary mask (white = editable area, black = frozen). Same resolution as input;.pngrecommended. If omitted, the model assumes global editing.*.txtβ edit instruction (one per pair). If any pair has no.txt, setdefault_captionor training will fail.
All files in a pair must share the same base name (e.g. edit_003).
Instruction guidelines
Describe only the change; do not restate the full image.
Use imperative edits (e.g. βreplace the background with a snowy mountain landscapeβ).
One instruction per file; multi-line is allowed.
Dataset size recommendations
Use case |
Recommended pairs |
|---|---|
Simple edits (background, color) |
100β300 |
Inpainting / object replacement |
200β500 |
Instruction-following editing |
500+ |
Editing models require more data than text-to-image (concept) fine-tuning.
Image requirements
Resolution β₯ 512Γ512; formats
.jpg,.png,.webp.Input and target must have the same resolution, be pixel-aligned, and differ only in the edited regions.
Avoid misaligned pairs, unrelated style changes, or multiple edits per example.
What not to include
Missing input/target pairs, wrong filename suffixes, non-binary masks, captions instead of edit instructions, or unlicensed content.
Training time
With default settings (e.g. 1000 steps), p-image-edit-trainer typically takes about 30β45 minutes for ~100 image pairs. p-image-trainer (text-to-image) is faster for comparable data.
Using LoRA weights at inference
Both P-Image LoRA inference (p-image-lora) and P-Image-Edit LoRA inference (p-image-edit-lora) accept:
lora_weights β URL to your LoRA. Supports Hugging Face:
huggingface.co/<owner>/<model-name>[/<lora-weights-file.safetensors>]. Gated or private repos may require hf_api_token.lora_scale β How strongly to apply the LoRA (default 1; 0β1+). Lower values blend more with the base model.
For full parameter lists and Replicate links, see P-Image and P-Image-Edit.