Prompt
Asuka Langley sitting cross-legged on a large, high-backed armchair. She is wearing a glossy, reflective red plugsuit. The plush armchair is upholstered in dark fabric. She is in an elegantly decorated room with a warm, glowing fireplace.

Notes

Various experiments, some decent, some bad. Trained on publicly accessible images with AI Toolkit on default settings with unquantized model and encoder. The best checkpoint should be the last one but mileage may vary. Trigger words are not necessary but can be used to make the effect stronger.

Digital Art

26 x 768px images, mostly close-ups, 4000 steps. I forgot to change the bucket size to 768px and trained them on 1024px, no idea if that affected anything. Booru-style captions but joined and without numerals, e.g., "girl" instead of "1girl", "long blonde hair" instead of "long hair, blonde hair".
Surprisingly good at all resolutions despite the small size but the fingers can get somewhat strange, the originals have some rather creative gestures and poses.

Digital Illustration

25 x 1024px images, mostly close-ups, 1024px buckets, 4000 steps. Booru-style captions as base but rewritten to be more natural, e.g., "a girl with long, blonde hair" instead of "1girl, long hair, blonde hair".
Pretty good but the originals tend to gravitate towards a certain size in the chest area and it can be difficult to change that aspect.

Digital Painting

v1 - 144 x 2048px images, 1024px and 1536px buckets, 4000 steps. Simple captions generated with Florence-2.
Pretty bad - details look sharp but the style is inconsistent.

v2 - 72 x 2048px images, 1024px buckets, 7000 steps. Very detailed captions generated with Qwen3-VL-32B.
Decent - details look slightly worse than v1 but the style is much more consistent. I'd probably use even fewer images for v3 but train with 1536px buckets and simpler captions.

Game Art

270 x 1024px images extracted from a certain cute and funny game, 1024px buckets, 5000 steps, very detailed Qwen3-VL-32B captions.
Very bad - way too many images, looks like it averaged the style instead of learning it. Definitely fewer images and simpler captions for v2.

Ink Art

82 x 2048px images, 1024px and 1536px buckets, 4000 steps, very detailed Qwen3-VL-32B captions.
Bad - very inconsistent at applying the style, seems to randomly work better at higher resolutions. Same recommendations as above.

Oil Painting

60 x 2048px images, 1024px and 1536px buckets, 4000 steps, simple Florence-2 captions.
Good - no real complaints. I tried training a v2 with more detailed captions but it didn't change much. The style seems very easy for the model to grasp.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Zuellni/Z-Image-LoRAs

Base model

Tongyi-MAI/Z-Image-Turbo

Adapter

(69)

this model

Zuellni
/

Z-Image-LoRAs