R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? Paper โข 2510.08189 โข Published Oct 9 โข 26
view article Article You could have designed state of the art positional encoding Nov 25, 2024 โข 404
SmolVLM: Redefining small and efficient multimodal models Paper โข 2504.05299 โข Published Apr 7 โข 200
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! Mar 7 โข 88
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping Paper โข 2409.15241 โข Published Sep 23, 2024 โข 1
Scaling Laws for Floating Point Quantization Training Paper โข 2501.02423 โข Published Jan 5 โข 26
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper โข 2404.14219 โข Published Apr 22, 2024 โข 259
Small-scale proxies for large-scale Transformer training instabilities Paper โข 2309.14322 โข Published Sep 25, 2023 โข 21
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets Paper โข 2201.02177 โข Published Jan 6, 2022 โข 2
view article Article A failed experiment: Infini-Attention, and why we should keep trying? +1 Aug 14, 2024 โข 71
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper โข 2405.20233 โข Published May 30, 2024 โข 7
Transformer Explainer: Interactive Learning of Text-Generative Models Paper โข 2408.04619 โข Published Aug 8, 2024 โข 172