Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Paper β’ 2310.11441 β’ Published Oct 17, 2023 β’ 29
IDEA-Research/grounding-dino-base Zero-Shot Object Detection β’ 0.2B β’ Updated May 12, 2024 β’ 1.17M β’ 146
Running on Zero MCP Featured 1.58k Wan2.1 Fast π₯ 1.58k Generate a video from an image with a prompt
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper β’ 2406.09403 β’ Published Jun 13, 2024 β’ 23
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper β’ 2505.21497 β’ Published May 27 β’ 109