WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7 • 141
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published 7 days ago • 29
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping Paper • 2510.08457 • Published Oct 9 • 12
ARES Collection 🌴ARES is an open-source framework for adaptive multimodal reasoning, using difficulty-aware training and entropy-shaped policy optimization. • 5 items • Updated Oct 12 • 2
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning Paper • 2509.03646 • Published Sep 3 • 30
Reconstruction Alignment Improves Unified Multimodal Models Paper • 2509.07295 • Published Sep 8 • 40
Interleaving Reasoning for Better Text-to-Image Generation Paper • 2509.06945 • Published Sep 8 • 14
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Paper • 2506.04207 • Published Jun 4 • 48
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published May 26 • 104