LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding Paper • 2501.08282 • Published Jan 14
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Paper • 2509.25541 • Published Sep 29 • 140