-
Can Large Language Models Understand Context?
Paper β’ 2402.00858 β’ Published β’ 23 -
OLMo: Accelerating the Science of Language Models
Paper β’ 2402.00838 β’ Published β’ 85 -
Self-Rewarding Language Models
Paper β’ 2401.10020 β’ Published β’ 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper β’ 2401.17072 β’ Published β’ 25
Collections
Discover the best community collections!
Collections including paper arxiv:2407.21783
-
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 18 -
Large Language Models Are Human-Level Prompt Engineers
Paper β’ 2211.01910 β’ Published β’ 1 -
Lost in the Middle: How Language Models Use Long Contexts
Paper β’ 2307.03172 β’ Published β’ 43 -
Large Language Models are Zero-Shot Reasoners
Paper β’ 2205.11916 β’ Published β’ 3
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper β’ 2412.15213 β’ Published β’ 28 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper β’ 2412.11768 β’ Published β’ 43 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper β’ 2412.13663 β’ Published β’ 158 -
Autoregressive Video Generation without Vector Quantization
Paper β’ 2412.14169 β’ Published β’ 14
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper β’ 2503.14476 β’ Published β’ 142 -
Training language models to follow instructions with human feedback
Paper β’ 2203.02155 β’ Published β’ 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper β’ 2307.09288 β’ Published β’ 247 -
The Llama 3 Herd of Models
Paper β’ 2407.21783 β’ Published β’ 117
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper β’ 2211.04325 β’ Published β’ 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 24 -
On the Opportunities and Risks of Foundation Models
Paper β’ 2108.07258 β’ Published β’ 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper β’ 2204.07705 β’ Published β’ 2
-
Qwen Technical Report
Paper β’ 2309.16609 β’ Published β’ 37 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper β’ 2311.07919 β’ Published β’ 10 -
Qwen2 Technical Report
Paper β’ 2407.10671 β’ Published β’ 167 -
Qwen2-Audio Technical Report
Paper β’ 2407.10759 β’ Published β’ 62
-
Can Large Language Models Understand Context?
Paper β’ 2402.00858 β’ Published β’ 23 -
OLMo: Accelerating the Science of Language Models
Paper β’ 2402.00838 β’ Published β’ 85 -
Self-Rewarding Language Models
Paper β’ 2401.10020 β’ Published β’ 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper β’ 2401.17072 β’ Published β’ 25
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper β’ 2503.14476 β’ Published β’ 142 -
Training language models to follow instructions with human feedback
Paper β’ 2203.02155 β’ Published β’ 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper β’ 2307.09288 β’ Published β’ 247 -
The Llama 3 Herd of Models
Paper β’ 2407.21783 β’ Published β’ 117
-
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 18 -
Large Language Models Are Human-Level Prompt Engineers
Paper β’ 2211.01910 β’ Published β’ 1 -
Lost in the Middle: How Language Models Use Long Contexts
Paper β’ 2307.03172 β’ Published β’ 43 -
Large Language Models are Zero-Shot Reasoners
Paper β’ 2205.11916 β’ Published β’ 3
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper β’ 2211.04325 β’ Published β’ 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 24 -
On the Opportunities and Risks of Foundation Models
Paper β’ 2108.07258 β’ Published β’ 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper β’ 2204.07705 β’ Published β’ 2
-
Qwen Technical Report
Paper β’ 2309.16609 β’ Published β’ 37 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper β’ 2311.07919 β’ Published β’ 10 -
Qwen2 Technical Report
Paper β’ 2407.10671 β’ Published β’ 167 -
Qwen2-Audio Technical Report
Paper β’ 2407.10759 β’ Published β’ 62
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper β’ 2412.15213 β’ Published β’ 28 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper β’ 2412.11768 β’ Published β’ 43 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper β’ 2412.13663 β’ Published β’ 158 -
Autoregressive Video Generation without Vector Quantization
Paper β’ 2412.14169 β’ Published β’ 14