Visual grounding on videos
#13
by
iariav
- opened
Hi,
first of all - amzing work! the new Qwen-vl-3 models are awsome.
I use them for visual grounding (BBOX) in videos. Currently, as bbox detection is only supported for images, i run frame-by-frame @ 1 fps but that takes a very long time.
from your experience, is there a better way to achieve accurate, consistent bbox detections from videos?