See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published 9 days ago • 7 • 3