Yulai Zhao's picture

4 52

Yulai Zhao

sarosavo

·

http://yulaizhao.com

AI & ML interests

None yet

Recent Activity

upvoted a paper 15 days ago

Reinforcement Learning for Self-Improving Agent with Skill Library

upvoted a paper 15 days ago

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

upvoted a paper 15 days ago

Step-DeepResearch Technical Report

View all activity

Organizations

upvoted 13 papers 15 days ago

Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published Dec 18, 2025 • 33

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Paper • 2512.19673 • Published Dec 22, 2025 • 63

Step-DeepResearch Technical Report

Paper • 2512.20491 • Published about 1 month ago • 84

LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

Paper • 2512.21010 • Published about 1 month ago • 4

Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Paper • 2512.19995 • Published Dec 23, 2025 • 16

Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards

Paper • 2512.21625 • Published 29 days ago • 4

Training AI Co-Scientists Using Rubric Rewards

Paper • 2512.23707 • Published 25 days ago • 21

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 23 days ago • 276

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Paper • 2512.24615 • Published 23 days ago • 116

E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

Paper • 2601.00423 • Published 22 days ago • 9

Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

Paper • 2601.02996 • Published 17 days ago • 5

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

Paper • 2512.23412 • Published 25 days ago • 38

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

Paper • 2512.22334 • Published 28 days ago • 35

upvoted 7 papers about 1 month ago

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Paper • 2512.19682 • Published Dec 22, 2025 • 16

Meta-RL Induces Exploration in Language Agents

Paper • 2512.16848 • Published Dec 18, 2025 • 11

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Paper • 2512.17008 • Published Dec 18, 2025 • 11

Are We on the Right Way to Assessing LLM-as-a-Judge?

Paper • 2512.16041 • Published Dec 17, 2025 • 33

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Paper • 2512.17260 • Published Dec 19, 2025 • 50

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Paper • 2512.16969 • Published Dec 18, 2025 • 116

Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image

Paper • 2512.16899 • Published Dec 18, 2025 • 13