Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures Paper • 2510.14616 • Published Oct 16, 2025 • 13
COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes Paper • 2510.14763 • Published Oct 16, 2025 • 14
Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny Paper • 2507.16331 • Published Jul 22, 2025 • 22
WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning Paper • 2509.23219 • Published Sep 27, 2025 • 19
Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification Paper • 2509.23061 • Published Sep 27, 2025 • 7
WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications Paper • 2505.14354 • Published May 20, 2025 • 2