Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models Paper • 2511.17487 • Published 19 days ago • 9
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15, 2024 • 37
Single-View 3D Human Digitalization with Large Reconstruction Models Paper • 2401.12175 • Published Jan 22, 2024 • 6
Describing Differences in Image Sets with Natural Language Paper • 2312.02974 • Published Dec 5, 2023 • 15