GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models Paper • 2511.11134 • Published 26 days ago • 31
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Paper • 2507.17512 • Published Jul 23 • 36
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once Paper • 2507.10541 • Published Jul 14 • 29
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 Mar 20, 2024 • 105
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts Paper • 2504.21117 • Published Apr 29 • 26
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges Paper • 2504.19093 • Published Apr 27 • 18
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis Paper • 2504.12322 • Published Apr 11 • 28
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis Paper • 2504.12322 • Published Apr 11 • 28
GRA Collection [A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis](arxiv.org/abs/2504.12322) • 9 items • Updated Apr 18 • 1
GRA Collection [A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis](arxiv.org/abs/2504.12322) • 9 items • Updated Apr 18 • 1