Tokenizer Choice For LLM Training: Negligible or Crucial? Paper β’ 2310.08754 β’ Published Oct 12, 2023 β’ 3
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings Paper β’ 2202.06671 β’ Published Feb 14, 2022 β’ 2
Specialized Document Embeddings for Aspect-based Similarity of Research Papers Paper β’ 2203.14541 β’ Published Mar 28, 2022
Investigating Gender Bias in Turkish Language Models Paper β’ 2404.11726 β’ Published Apr 17, 2024 β’ 1
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning Paper β’ 2301.09626 β’ Published Jan 23, 2023 β’ 2
MMTEB: Massive Multilingual Text Embedding Benchmark Paper β’ 2502.13595 β’ Published Feb 19, 2025 β’ 43