VisPlay: Self-Evolving Vision-Language Models from Images Paper • 2511.15661 • Published 20 days ago • 42
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning Paper • 2510.06217 • Published Oct 7 • 63
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training Paper • 2509.03403 • Published Sep 3 • 22
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30 • 55
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30 • 55 • 3