Visual grounding on videos

#13

by iariav - opened 25 days ago

25 days ago

Hi,
first of all - amzing work! the new Qwen-vl-3 models are awsome.

I use them for visual grounding (BBOX) in videos. Currently, as bbox detection is only supported for images, i run frame-by-frame @ 1 fps but that takes a very long time.
from your experience, is there a better way to achieve accurate, consistent bbox detections from videos?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment