Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 5 days ago • 151
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published 7 days ago • 46
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published 7 days ago • 57
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published 25 days ago • 111
ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries Paper • 2511.14349 • Published 21 days ago • 16
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6 • 208
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published Nov 3 • 31
World Simulation with Video Foundation Models for Physical AI Paper • 2511.00062 • Published Oct 28 • 40
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published Oct 27 • 52
Gauss Gym Datasets Collection Datasets used for the gauss gym photorealistic simulator • 4 items • Updated Oct 17 • 8
FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published Oct 15 • 71
view article Article Introduction to MedVideoCap-55K: A New, Large-Scale, High-Quality Medical Video-Caption Pair Dataset Jun 25 • 10
CommonForms: A Large, Diverse Dataset for Form Field Detection Paper • 2509.16506 • Published Sep 20 • 19
EgoLife Collection CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/ • 10 items • Updated Mar 7 • 20
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published Jan 7 • 81
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Paper • 2509.19296 • Published Sep 23 • 23