Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward Paper • 2512.16912 • Published 14 days ago • 10
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators Paper • 2512.19682 • Published 10 days ago • 15
Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward Paper • 2512.16912 • Published 14 days ago • 10
Geometric Framework for 3D Cell Segmentation Correction Paper • 2502.01890 • Published Feb 3, 2025
ComPO: Preference Alignment via Comparison Oracles Paper • 2505.05465 • Published May 8, 2025 • 1
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO Paper • 2505.11595 • Published May 16, 2025 • 1