OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Paper • 2512.07802 • Published 2 days ago • 31
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published 9 days ago • 60
MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs Paper • 2510.25867 • Published Oct 29 • 6
Uniform Discrete Diffusion with Metric Path for Video Generation Paper • 2510.24717 • Published Oct 28 • 39
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 647
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models Paper • 2504.00869 • Published Apr 1 • 10
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization Paper • 2410.06244 • Published Oct 8, 2024 • 19
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published Jun 18, 2024 • 31
SILC: Improving Vision Language Pretraining with Self-Distillation Paper • 2310.13355 • Published Oct 20, 2023 • 9
Contrastive Decoding Improves Reasoning in Large Language Models Paper • 2309.09117 • Published Sep 17, 2023 • 39