LoRA Training and Inference

This page explains what LoRAs are, how to prepare datasets for LoRA training, and how to use LoRA weights at LoRA inference. For full parameter lists, see P-Image and P-Image-Edit.

Two workflows (don’t mix them)

Workflow

Use case

Training

Inference

P-Image LoRA Training and Inference

trigger word + prompt β†’ image

p-image-trainer

p-image-lora

P-Image-Edit LoRA Training and Inference

trigger word + input image (+ prompt) β†’ edited image

p-image-edit-trainer

p-image-edit-lora

What are LoRAs?

LoRA (Low-Rank Adaptation) is a lightweight way to fine-tune diffusion models. Instead of retraining the full model, LoRA trains a small set of extra weights that are applied on top of the base model at inference time.

Benefits

  • Small files β€” LoRA weights are often a few dozen MB, not full model size.

  • Faster training β€” Less compute than full fine-tuning.

  • Swappable β€” Use different LoRAs on the same base model without reloading.

Workflow

  1. LoRA training β€” Upload a dataset to the trainer endpoint (P-Image LoRA: p-image-trainer; P-Image-Edit LoRA: p-image-edit-trainer). The trainer produces LoRA weights (e.g. hosted on Hugging Face).

  2. LoRA inference β€” Call the inference endpoint (P-Image LoRA: p-image-lora; P-Image-Edit LoRA: p-image-edit-lora) with lora_weights set to your weights URL. Optionally set lora_scale to control strength (0–1+).

Text-to-image: dataset preparation

Use this format for P-Image LoRA training (p-image-trainer) β€” custom concepts, characters, products, or styles.

Note

You can find a properly formatted text-to-image dataset for stylized generation on Hugging Face here:

Folder structure

Your dataset must be a single folder (zipped for upload). Each image can have a matching caption file with the same base filename.

dataset/
β”œβ”€β”€ image_001.jpg
β”œβ”€β”€ image_001.txt
β”œβ”€β”€ image_002.png
β”œβ”€β”€ image_002.txt
β”œβ”€β”€ image_003.webp
β”œβ”€β”€ image_003.txt

Rules

  • Every image can have a .txt caption file (e.g. photo.txt for photo.jpg). If captions are missing, set default_caption in the trainer or training will fail.

  • Use at least 10 images but preferably a little more. Quality and diversity matter more than raw quantity.

Caption guidelines

  • Describe what you want the model to learn. Use clear, natural language; multi-line is allowed.

  • For a person, character, product, or specific visual concept, use a unique trigger word in every caption (e.g. tok_my_concept or sks_my_concent). Use it consistently so you can prompt the trained model with that same token later.

Dataset size recommendations

Use case

Recommended images

Person / character

15–40

Product

20–50

Style / aesthetic

30–100

General concept

50+

Image requirements

  • Minimum resolution 512Γ—512; formats .jpg, .png, .webp.

  • Prefer varied angles, lighting, backgrounds, and poses. Avoid duplicates, watermarks, or low-resolution images.

What not to include

Nested folders, missing or mismatched caption files, reused captions, unlicensed or disallowed content.

Image editing: dataset preparation

Use this format for P-Image-Edit LoRA training (p-image-edit-trainer) β€” inpainting, instruction-based editing. Editing requires paired data: original image, edited target, and optionally a mask and instruction.

Note

You can find a properly formatted image editing datasets on Hugging Face here:

Folder structure

Each example is identified by a shared base filename.

dataset/
β”œβ”€β”€ edit_001_input.png
β”œβ”€β”€ edit_001_target.png
β”œβ”€β”€ edit_001.txt
β”œβ”€β”€ edit_002_input.jpg
β”œβ”€β”€ edit_002_target.jpg
β”œβ”€β”€ edit_002.txt

Optional (inpainting / masked editing)

dataset/
β”œβ”€β”€ edit_003_input.png
β”œβ”€β”€ edit_003_target.png
β”œβ”€β”€ edit_003_mask.png
β”œβ”€β”€ edit_003.txt

Naming rules

  • *_input (or *_start) β†’ original image

  • *_target (or *_end) β†’ edited result (what the model should learn to produce)

  • *_mask β†’ optional binary mask (white = editable area, black = frozen). Same resolution as input; .png recommended. If omitted, the model assumes global editing.

  • *.txt β†’ edit instruction (one per pair). If any pair has no .txt, set default_caption or training will fail.

All files in a pair must share the same base name (e.g. edit_003).

Instruction guidelines

  • Describe only the change; do not restate the full image.

  • Use imperative edits (e.g. β€œreplace the background with a snowy mountain landscape”).

  • One instruction per file; multi-line is allowed.

Dataset size recommendations

Use case

Recommended pairs

Simple edits (background, color)

100–300

Inpainting / object replacement

200–500

Instruction-following editing

500+

Editing models require more data than text-to-image (concept) fine-tuning.

Image requirements

  • Resolution β‰₯ 512Γ—512; formats .jpg, .png, .webp.

  • Input and target must have the same resolution, be pixel-aligned, and differ only in the edited regions.

  • Avoid misaligned pairs, unrelated style changes, or multiple edits per example.

What not to include

Missing input/target pairs, wrong filename suffixes, non-binary masks, captions instead of edit instructions, or unlicensed content.

Training time

With default settings (e.g. 1000 steps), p-image-edit-trainer typically takes about 30–45 minutes for ~100 image pairs. p-image-trainer (text-to-image) is faster for comparable data.

Using LoRA weights at inference

Both P-Image LoRA inference (p-image-lora) and P-Image-Edit LoRA inference (p-image-edit-lora) accept:

  • lora_weights β€” URL to your LoRA. Supports Hugging Face: huggingface.co/<owner>/<model-name>[/<lora-weights-file.safetensors>]. Gated or private repos may require hf_api_token.

  • lora_scale β€” How strongly to apply the LoRA (default 1; 0–1+). Lower values blend more with the base model.

For full parameter lists and Replicate links, see P-Image and P-Image-Edit.