Edd's picture

Edd

Erland

·

AI & ML interests

None yet

Recent Activity

upvoted a collection 6 days ago

upvoted an article 8 days ago

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

upvoted an article 9 days ago

Continuous batching from first principles

View all activity

Organizations

None yet

upvoted a collection 6 days ago

Ministral 3

Mistral Ministral 3: new multimodal models in Base, Instruct, and Reasoning variants, available in 3B, 8B, and 14B sizes. • 36 items • Updated 1 day ago • 21

upvoted an article 8 days ago

Article

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

+5

Sep 11

•

166

upvoted an article 9 days ago

Article

Continuous batching from first principles

+1

14 days ago

•

256

upvoted 2 papers 3 months ago

Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

Paper • 2509.14008 • Published Sep 17 • 88

Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

Paper • 2509.01363 • Published Sep 1 • 58

upvoted a collection 3 months ago

ByteDance Papers

ByteDance papers collection • 127 items • Updated about 23 hours ago • 19

upvoted a paper 3 months ago

Predicting the Order of Upcoming Tokens Improves Language Modeling

Paper • 2508.19228 • Published Aug 26 • 23

upvoted an article 4 months ago

Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Apr 16

•

56

upvoted a collection 5 months ago

Indonesian Text Similarity Dataset

This collection contains currated text similarity datasets that are available in huggingface dataset • 16 items • Updated Jul 11 • 5

upvoted an article 5 months ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

+4

Jun 3

•

96

upvoted a collection 6 months ago

Gemma 3n

Google Gemma 3n models, all versions including Dynamic GGUF, 4-bit, 16-bit and formats! • 10 items • Updated 7 days ago • 25

upvoted a paper 6 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 263

upvoted 2 papers 7 months ago

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published Apr 29 • 32

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Paper • 2504.17768 • Published Apr 24 • 13

upvoted a collection 10 months ago

Mistral Small 3 (All Versions)

A collection of Mistral's new Small 3.2 and 3 models including GGUF, 4-bit and more! • 20 items • Updated 7 days ago • 18

upvoted a collection 11 months ago

DeepSeek R1 (All Versions)

DeepSeek-R1-0528 is here! The most powerful reasoning open LLM, available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 37 items • Updated 7 days ago • 261

upvoted 2 papers 11 months ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 286

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99

upvoted a collection 11 months ago

Phi-4 (All Versions)

Microsoft's Phi-4 models including Reasoning + Reasoning Plus & mini. Includes Dynamic 2.0 GGUF, 4-bit & 16-bit versions. Includes Unsloth's bug fixes • 20 items • Updated 7 days ago • 76

upvoted a paper 11 months ago

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 52