Spaces:
Runtime error
Runtime error
File size: 3,087 Bytes
81c7b48 b2759a6 81c7b48 c6eb9ce 2db3081 ba2b753 2db3081 4b92433 86624ba 4b92433 86624ba 4b92433 49daec8 4b92433 2eab3fe 4b92433 2eab3fe 4b92433 2eab3fe b1917a1 2eab3fe 4b92433 49daec8 4b92433 c6eb9ce 2db3081 c6eb9ce 902fe56 c6eb9ce 64daa59 c6eb9ce 902fe56 c6eb9ce b2feaca 214fb24 0648692 8b89d95 8e7d86e ca01077 d4de7c3 17693f6 1ebe38f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
---
title: WeavePrompt
emoji: 🎨
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: 5.44.1
pinned: false
license: mit
app_file: app.py
app_port: 7860
---
# WeavePrompt
Iterative prompt refinement for image generation models; by giving a target image, **WeavePrompt** automatically generates and refines text prompts to make a model's output resemble the target image, using vision-language models and perceptual metrics.
## Introduction
**WeavePrompt** is a research and development project designed to evaluate and refine text-to-image generation prompts across multiple state-of-the-art image generation models.
The primary goal is to optimize prompts such that the generated images align closely with a given reference image, improving both fidelity and semantic consistency.
**Procedure/Implementation**:
The process involves generating images from identical prompts using various image generation models, comparing the results to a reference image through a recognition and similarity evaluation pipeline, and iteratively adjusting the prompt to minimize perceptual differences.
This feedback loop continues for a set number of iterations, progressively enhancing prompt effectiveness.
To achieve this, **WeavePrompt** integrates advanced tools:
- **Image recognition** is powered by meta-llama/Llama-4-Scout-17B-16E-Instruct.
- **Similarity evaluation** uses the **LPIPS (alex)** metric for perceptual comparison.
- **Image generation models** under evaluation include:
- FLUX family: FLUX.1 [pro], [dev], and [schnell]
- Google models: Imagen 4, Imagen 4 Ultra, and Gemini 2.5 Flash Image
- Other models: Stable Diffusion 3.5 Large and Qwen Image
By systematically combining prompt optimization with multi-model evaluation, **WeavePrompt** aims to advance the understanding of cross-model prompt effectiveness and improve controllability in image generation tasks.
## Features
- Upload a target image
- Step-by-step prompt optimization
- View prompt and generated image at each iteration
- Full optimization history
## Installation
1. Clone the repository:
```bash
git clone https://github.com/kevin1kevin1k/WeavePrompt.git
cd WeavePrompt
```
2. Install dependencies:
```bash
uv venv
uv sync
source .venv/bin/activate
```
3. Setup `.env`
Put the following inside `.env`:
- API keys `WANDB_API_KEY` and `FAL_KEY`
- Weave project name `WEAVE_PROJECT`
## Usage
Run the demo app:
```bash
streamlit run src/app.py
```
Follow the instructions in the browser to upload an image and step through the optimization process.
## Architecture Diagram

## Outcome

Use the same prompt as the standard model, the target model yields the similar (high quality) output as a result.
## References
- https://arxiv.org/abs/1801.03924 - The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
- https://arxiv.org/abs/2510.06335 - Image Reconstruction from Highly Undersampled Data
- https://arxiv.org/abs/2510.03191 - Product-Quantised Image Representation for High-Quality Image Synthesis
|