Cyril666/whisper-large-v3-encoder Automatic Speech Recognition • 0.6B • Updated 4 days ago • 124
Cyril666/whisper-large-v3-encoder Automatic Speech Recognition • 0.6B • Updated 4 days ago • 124
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 10 days ago • 19
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published 10 days ago • 10
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement Paper • 2512.13303 • Published 13 days ago • 16