Spaces:

obsxrver
/

Wan2.2-I2V-lora-demo

Running on Zero

App Files Files

Wan2.2-I2V-lora-demo / USAGE.md

[email protected]

Initial commit

4b40584 about 1 month ago

preview code

raw

history blame

4.4 kB

	# Usage Guide - WAN 2.2 Image-to-Video LoRA Demo

	## Quick Start

	### 1. Deploying to Hugging Face Spaces

	To deploy this demo to Hugging Face Spaces:

	```bash
	# Install git-lfs if not already installed
	git lfs install

	# Create a new Space on huggingface.co
	# Then clone your space repository
	git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
	cd YOUR_SPACE_NAME

	# Copy all files from this demo
	cp -r * YOUR_SPACE_NAME/

	# Commit and push
	git add .
	git commit -m "Initial commit: WAN 2.2 Image-to-Video LoRA Demo"
	git push
	```

	### 2. Running Locally

	```bash
	# Create a virtual environment
	python -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt

	# Run the app
	python app.py
	```

	The app will be available at `http://localhost:7860`

	## Using the Demo

	### Basic Usage

	1. Upload Image: Click the image upload area and select an image file
	2. Enter Prompt: Type a description of the motion you want (e.g., "A person walking forward, cinematic")
	3. Click Generate: Wait for the video to be generated (first run will download the model)
	4. View Result: The generated video will appear in the output area

	### Advanced Settings

	Expand the "Advanced Settings" accordion to access:

	- Inference Steps (20-100): More steps = higher quality but slower generation
	- 20-30: Fast, lower quality
	- 50: Balanced (recommended)
	- 80-100: Slow, highest quality

	- Guidance Scale (1.0-15.0): How closely to follow the prompt
	- 1.0-3.0: More creative, less faithful to prompt
	- 6.0: Balanced (recommended)
	- 10.0-15.0: Very faithful to prompt, less creative

	- Use LoRA: Enable/disable LoRA fine-tuning

	- LoRA Type:
	- High-Noise: Best for dynamic, action-heavy scenes
	- Low-Noise: Best for subtle, smooth motions

	## Example Prompts

	### Good Prompts

	- "A cat walking through a garden, sunny day, high quality"
	- "Waves crashing on a beach, sunset lighting, cinematic"
	- "A car driving down a highway, fast motion, 4k"
	- "Smoke rising from a campfire, slow motion"

	### Tips for Better Results

	1. Be Specific: Include details about motion, lighting, and quality
	2. Use Keywords: "cinematic", "high quality", "4k", "smooth"
	3. Describe Motion: Clearly state what should move and how
	4. Consider Style: Add style descriptors like "photorealistic" or "animated"

	## Troubleshooting

	### Out of Memory Error

	If you encounter OOM errors:

	1. The model requires significant VRAM (16GB+ recommended)
	2. On Hugging Face Spaces, ensure you're using at least `gpu-medium` hardware
	3. For local runs, try reducing the number of frames or using CPU offloading

	### Slow Generation

	- First generation will be slower (model downloads)
	- Reduce inference steps for faster results
	- Ensure GPU is being used (check logs for "Loading model on cuda")

	### Model Not Loading

	If the model fails to load:

	1. Check your internet connection (model is ~20GB)
	2. Ensure sufficient disk space
	3. For Hugging Face Spaces, check your Space's logs

	## Customization

	### Using Your Own LoRA Files

	To use your own LoRA weights:

	1. Upload LoRA `.safetensors` files to Hugging Face
	2. Update the URLs in `app.py`:

	```python
	HIGH_NOISE_LORA_URL = "https://huggingface.co/YOUR_USERNAME/YOUR_REPO/resolve/main/your_lora.safetensors"
	```

	3. Uncomment and implement the LoRA loading code in the `generate_video` function

	### Changing the Model

	To use a different model:

	1. Update `MODEL_ID` in `app.py`
	2. Ensure the model is compatible with `CogVideoXImageToVideoPipeline`
	3. Adjust memory optimizations if needed

	## Performance Notes

	- GPU (A10G/T4): ~2-3 minutes per video
	- GPU (A100): ~1-2 minutes per video
	- CPU: Not recommended (20+ minutes)

	## API Access

	For programmatic access, you can use the Gradio Client:

	```python
	from gradio_client import Client

	client = Client("YOUR_USERNAME/YOUR_SPACE_NAME")
	result = client.predict(
	image="path/to/image.jpg",
	prompt="A cat walking",
	api_name="/predict"
	)
	```

	## Credits

	- Model: CogVideoX by THUDM
	- Framework: Hugging Face Diffusers
	- Interface: Gradio

	## License

	Apache 2.0 - See LICENSE file for details