LoRA Training and Inference

This page explains what LoRAs are, how to prepare datasets for LoRA training, and how to use LoRA weights at LoRA inference. For full parameter lists, see P-Image and P-Image-Edit.

Two workflows (don’t mix them)

Workflow	Use case	Training	Inference
P-Image LoRA Training and Inference	trigger word + prompt → image	p-image-trainer	p-image-lora
P-Image-Edit LoRA Training and Inference	trigger word + input image (+ prompt) → edited image	p-image-edit-trainer	p-image-edit-lora

P-Image LoRA: train with p-image-trainer, run inference with p-image-lora. End-to-end notebook: P-Image LoRA: Training and Inference.
P-Image-Edit LoRA: train with p-image-edit-trainer, run inference with p-image-edit-lora. End-to-end notebook: P-Image-Edit LoRA: Training and Inference.

What are LoRAs?

LoRA (Low-Rank Adaptation) is a lightweight way to fine-tune diffusion models. Instead of retraining the full model, LoRA trains a small set of extra weights that are applied on top of the base model at inference time.

Benefits

Small files — LoRA weights are often a few dozen MB, not full model size.
Faster training — Less compute than full fine-tuning.
Swappable — Use different LoRAs on the same base model without reloading.

Workflow

LoRA training — Upload a dataset to the trainer endpoint (P-Image LoRA: p-image-trainer; P-Image-Edit LoRA: p-image-edit-trainer). The trainer produces LoRA weights (e.g. hosted on Hugging Face).
LoRA inference — Call the inference endpoint (P-Image LoRA: p-image-lora; P-Image-Edit LoRA: p-image-edit-lora) with lora_weights set to your weights URL. Optionally set lora_scale to control strength (0–1+).

Text-to-image: dataset preparation

Use this format for P-Image LoRA training (p-image-trainer) — custom concepts, characters, products, or styles.

Note

You can find a properly formatted text-to-image dataset for stylized generation on Hugging Face here:

Folder structure

Your dataset must be a single folder (zipped for upload). Each image can have a matching caption file with the same base filename.

dataset/
├── image_001.jpg
├── image_001.txt
├── image_002.png
├── image_002.txt
├── image_003.webp
├── image_003.txt

Rules

Every image can have a .txt caption file (e.g. photo.txt for photo.jpg). If captions are missing, set default_caption in the trainer or training will fail.
Use at least 10 images but preferably a little more. Quality and diversity matter more than raw quantity.

Caption guidelines

Describe what you want the model to learn. Use clear, natural language; multi-line is allowed.
For a person, character, product, or specific visual concept, use a unique trigger word in every caption (e.g. tok_my_concept or sks_my_concent). Use it consistently so you can prompt the trained model with that same token later.

Dataset size recommendations

Use case	Recommended images
Person / character	15–40
Product	20–50
Style / aesthetic	30–100
General concept	50+

Image requirements

Minimum resolution 512×512; formats .jpg, .png, .webp.
Prefer varied angles, lighting, backgrounds, and poses. Avoid duplicates, watermarks, or low-resolution images.

What not to include

Nested folders, missing or mismatched caption files, reused captions, unlicensed or disallowed content.

Image editing: dataset preparation

Use this format for P-Image-Edit LoRA training (p-image-edit-trainer) — inpainting, instruction-based editing. Editing requires paired data: original image, edited target, and optionally a mask and instruction.

Note

You can find a properly formatted image editing datasets on Hugging Face here:

Folder structure

Each example is identified by a shared base filename.

dataset/
├── edit_001_input.png
├── edit_001_target.png
├── edit_001.txt
├── edit_002_input.jpg
├── edit_002_target.jpg
├── edit_002.txt

Optional (inpainting / masked editing)

dataset/
├── edit_003_input.png
├── edit_003_target.png
├── edit_003_mask.png
├── edit_003.txt

Naming rules

*_input (or *_start) → original image
*_target (or *_end) → edited result (what the model should learn to produce)
*_mask → optional binary mask (white = editable area, black = frozen). Same resolution as input; .png recommended. If omitted, the model assumes global editing.
*.txt → edit instruction (one per pair). If any pair has no .txt, set default_caption or training will fail.

All files in a pair must share the same base name (e.g. edit_003).

Instruction guidelines

Describe only the change; do not restate the full image.
Use imperative edits (e.g. “replace the background with a snowy mountain landscape”).
One instruction per file; multi-line is allowed.

Dataset size recommendations

Use case	Recommended pairs
Simple edits (background, color)	100–300
Inpainting / object replacement	200–500
Instruction-following editing	500+

Editing models require more data than text-to-image (concept) fine-tuning.

Image requirements

Resolution ≥ 512×512; formats .jpg, .png, .webp.
Input and target must have the same resolution, be pixel-aligned, and differ only in the edited regions.
Avoid misaligned pairs, unrelated style changes, or multiple edits per example.

What not to include

Missing input/target pairs, wrong filename suffixes, non-binary masks, captions instead of edit instructions, or unlicensed content.

Training time

With default settings (e.g. 1000 steps), p-image-edit-trainer typically takes about 30–45 minutes for ~100 image pairs. p-image-trainer (text-to-image) is faster for comparable data.

Using LoRA weights at inference

Both P-Image LoRA inference (p-image-lora) and P-Image-Edit LoRA inference (p-image-edit-lora) accept:

lora_weights — URL to your LoRA. Supports Hugging Face: huggingface.co/<owner>/<model-name>[/<lora-weights-file.safetensors>]. Gated or private repos may require hf_api_token.
lora_scale — How strongly to apply the LoRA (default 1; 0–1+). Lower values blend more with the base model.

For full parameter lists and Replicate links, see P-Image and P-Image-Edit.