The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation Paper • 2511.20256 • Published 15 days ago • 26
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper • 2511.11434 • Published 26 days ago • 44
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published 27 days ago • 93
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published 30 days ago • 104
Revisiting Multimodal Positional Encoding in Vision-Language Models Paper • 2510.23095 • Published Oct 27 • 20
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4 • 101
ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks Paper • 2510.18455 • Published Oct 21 • 17
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10 • 50
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6 • 117
Code2Video: A Code-centric Paradigm for Educational Video Generation Paper • 2510.01174 • Published Oct 1 • 33
Robix: A Unified Model for Robot Interaction, Reasoning and Planning Paper • 2509.01106 • Published Sep 1 • 49
Draw-In-Mind: Learning Precise Image Editing via Chain-of-Thought Imagination Paper • 2509.01986 • Published Sep 2 • 4
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2 • 124