File size: 3,087 Bytes
81c7b48
 
 
 
 
 
 
 
 
b2759a6
81c7b48
 
 
c6eb9ce
2db3081
ba2b753
2db3081
4b92433
 
86624ba
 
4b92433
86624ba
 
 
4b92433
49daec8
4b92433
2eab3fe
4b92433
2eab3fe
4b92433
2eab3fe
b1917a1
2eab3fe
 
4b92433
49daec8
4b92433
c6eb9ce
 
 
 
 
2db3081
c6eb9ce
 
 
 
902fe56
c6eb9ce
 
 
 
 
 
 
 
64daa59
 
 
 
c6eb9ce
 
 
 
 
902fe56
c6eb9ce
 
 
b2feaca
 
 
 
214fb24
0648692
 
 
 
 
8b89d95
8e7d86e
ca01077
d4de7c3
17693f6
1ebe38f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
title: WeavePrompt
emoji: 🎨
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: 5.44.1
pinned: false
license: mit
app_file: app.py
app_port: 7860
---

# WeavePrompt

Iterative prompt refinement for image generation models; by giving a target image, **WeavePrompt** automatically generates and refines text prompts to make a model's output resemble the target image, using vision-language models and perceptual metrics.

## Introduction

**WeavePrompt** is a research and development project designed to evaluate and refine text-to-image generation prompts across multiple state-of-the-art image generation models.
The primary goal is to optimize prompts such that the generated images align closely with a given reference image, improving both fidelity and semantic consistency.

**Procedure/Implementation**:
The process involves generating images from identical prompts using various image generation models, comparing the results to a reference image through a recognition and similarity evaluation pipeline, and iteratively adjusting the prompt to minimize perceptual differences.
This feedback loop continues for a set number of iterations, progressively enhancing prompt effectiveness.

To achieve this, **WeavePrompt** integrates advanced tools:

- **Image recognition** is powered by meta-llama/Llama-4-Scout-17B-16E-Instruct.

- **Similarity evaluation** uses the **LPIPS (alex)** metric for perceptual comparison.

- **Image generation models** under evaluation include:
	- FLUX family: FLUX.1 [pro], [dev], and [schnell]
	- Google models: Imagen 4, Imagen 4 Ultra, and Gemini 2.5 Flash Image
	- Other models: Stable Diffusion 3.5 Large and Qwen Image

By systematically combining prompt optimization with multi-model evaluation, **WeavePrompt** aims to advance the understanding of cross-model prompt effectiveness and improve controllability in image generation tasks.

## Features
- Upload a target image
- Step-by-step prompt optimization
- View prompt and generated image at each iteration
- Full optimization history

## Installation

1. Clone the repository:
	```bash
	git clone https://github.com/kevin1kevin1k/WeavePrompt.git
	cd WeavePrompt
	```
2. Install dependencies:
	```bash
	uv venv
	uv sync
    source .venv/bin/activate
	```
3. Setup `.env`
Put the following inside `.env`:
- API keys `WANDB_API_KEY` and `FAL_KEY`
- Weave project name `WEAVE_PROJECT`

## Usage

Run the demo app:
```bash
streamlit run src/app.py
```

Follow the instructions in the browser to upload an image and step through the optimization process.

## Architecture Diagram

![diagram](./diagram.png)


## Outcome

![outcome](./outcome.png)

Use the same prompt as the standard model, the target model yields the similar (high quality) output as a result.

## References
- https://arxiv.org/abs/1801.03924 - The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
- https://arxiv.org/abs/2510.06335 - Image Reconstruction from Highly Undersampled Data
- https://arxiv.org/abs/2510.03191 - Product-Quantised Image Representation for High-Quality Image Synthesis