Generate edited video frames using text prompts
Create images from pose-guided prompts
Transform images based on text instructions