Multimodal
updated
Visual Representation Alignment for Multimodal Large Language Models
Paper
•
2509.07979
•
Published
•
83
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for
Interactive Complex World Generation
Paper
•
2509.05263
•
Published
•
10
Symbolic Graphics Programming with Large Language Models
Paper
•
2509.05208
•
Published
•
46
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper
•
2509.12201
•
Published
•
105
Multimodal Reasoning for Science: Technical Report and 1st Place
Solution to the ICML 2025 SeePhys Challenge
Paper
•
2509.06079
•
Published
•
6
Lost in Embeddings: Information Loss in Vision-Language Models
Paper
•
2509.11986
•
Published
•
28
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
Paper
•
2509.11362
•
Published
•
4
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
Paper
•
2509.11543
•
Published
•
47
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods,
Results, Discussion, and Outlook
Paper
•
2509.14142
•
Published
•
10
Qwen3-Omni Technical Report
Paper
•
2509.17765
•
Published
•
143