AlignGuard: Scalable Safety Alignment for Text-to-Image Generation Paper • 2412.10493 • Published Dec 13, 2024
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 5 days ago • 49
Latent Guard: a Safety Framework for Text-to-image Generation Paper • 2404.08031 • Published Apr 11, 2024
Fake it till You Make it: Reward Modeling as Discriminative Prediction Paper • 2506.13846 • Published Jun 16
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published May 17 • 58
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published Mar 28 • 45