Collections
Discover the best community collections!
Collections including paper arxiv:2212.04356
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 11 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper • 2309.16414 • Published • 19 -
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Paper • 2309.13018 • Published • 9 -
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 43 -
Language models in molecular discovery
Paper • 2309.16235 • Published • 10
-
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 43 -
Conformer: Convolution-augmented Transformer for Speech Recognition
Paper • 2005.08100 • Published • 1 -
wav2vec: Unsupervised Pre-training for Speech Recognition
Paper • 1904.05862 • Published -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 8
-
Whisper
📉2.63kTranscribe audio files or YouTube videos into text
-
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 43 -
openai/whisper-large-v2
Automatic Speech Recognition • 2B • Updated • 181k • 1.78k -
openai/whisper-large
Automatic Speech Recognition • 2B • Updated • 120k • 527
-
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 43 -
Conformer: Convolution-augmented Transformer for Speech Recognition
Paper • 2005.08100 • Published • 1 -
wav2vec: Unsupervised Pre-training for Speech Recognition
Paper • 1904.05862 • Published -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 8
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 11 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper • 2309.16414 • Published • 19 -
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Paper • 2309.13018 • Published • 9 -
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 43 -
Language models in molecular discovery
Paper • 2309.16235 • Published • 10
-
Whisper
📉2.63kTranscribe audio files or YouTube videos into text
-
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 43 -
openai/whisper-large-v2
Automatic Speech Recognition • 2B • Updated • 181k • 1.78k -
openai/whisper-large
Automatic Speech Recognition • 2B • Updated • 120k • 527