11730 39 18

Loïck BOURDOIS PRO

lbourdois

kargaranamir's profile picture

AbdoulDaff's profile picture

masoudmarandi's profile picture

https://lbourdois.github.io/blog/

BdsLoick
lbourdois
lbourdois
lbourdois.bsky.social

AI & ML interests

👀

Recent Activity

commented on their article about 15 hours ago

Model statistics of the 50 most downloaded entities on Hugging Face

commented on their article about 18 hours ago

Model statistics of the 50 most downloaded entities on Hugging Face

View all activity

Organizations

lbourdois 's collections 14

French packs

French ressources (datasets & models) I developped to empower use cases in French

French prompts datasets

Collection

French prompts dataset developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 30,000 downloads. • 3 items • Updated Aug 8
French NER

Collection

NER models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 170,000 downloads. • 11 items • Updated Aug 8
French QA

Collection

QA models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 150,000 downloads. • 6 items • Updated Aug 8
French VQA datasets

Collection

VQA datasets I cleaned with an image, a question and an answer. Can be used to train VLMs. • 12 items • Updated Aug 8

FAT5

Flash Attention T5 (FAT5) models developped when I worked at CATIE (https://hf.co/CATIE-AQ).

Running

9

FAT5 (Flash Attention T5) report

⚡

9

English version of the blog post introducing FAT5 model
Running

Le FAT5 : Flash Attention T5

⚡

French version of the blog post introducing FAT5 model
CATIE-AQ/FAT5-small

0.1B • Updated Mar 17 • 13 • 2

French NER

NER models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 170,000 downloads.

CATIE-AQ/Moderncamembert_3entities

Token Classification • 0.1B • Updated Apr 22 • 9 • 1
CATIE-AQ/NERmemberta-3entities

Token Classification • 0.1B • Updated Dec 5, 2024 • 33 • 1
CATIE-AQ/NERmembert-base-3entities

Token Classification • 0.1B • Updated Nov 26, 2024 • 75 • 2
CATIE-AQ/NERmembert-large-3entities

Token Classification • 0.3B • Updated Nov 26, 2024 • 93 • 2

French prompts datasets

French prompts dataset developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 30,000 downloads.

CATIE-AQ/DFP

Viewer • Updated Nov 3 • 108M • 2.89k • 8
CATIE-AQ/facebook-community-alignment-dataset_french_dpo

Viewer • Updated Jul 29 • 71.5k • 47 • 1
CATIE-AQ/everyday-conversations-llama3.1-2k-in-french

Viewer • Updated Jul 31 • 2.38k • 50 • 1

French VQA datasets

VQA datasets I cleaned with an image, a question and an answer. Can be used to train VLMs.

CATIE-AQ/VQA-cmarkea-table-vqa-clean

Viewer • Updated Jul 15 • 84.1k • 225
CATIE-AQ/VQA-cmarkea-doc-vqa-clean

Viewer • Updated Jul 15 • 60.9k • 736
lbourdois/VQA-neulab-CulturalGround-clean

Viewer • Updated Aug 12 • 1.71M • 3.43k • 1
lbourdois/VQA-worldcuisines-vqa-clean

Viewer • Updated Jul 15 • 38.4k • 191

French OCR datasets

Datasets I cleaned with an image, a prompt question (like "transcribe the text in this image") and an answer. Can be used to train VLMs.

lbourdois/OCR-neulab-PangeaInstruct-OCR-clean

Viewer • Updated Jul 15 • 30k • 227
lbourdois/OCR-liboaccn-OPUS-MIT-5M-clean

Viewer • Updated Jul 15 • 530k • 571
lbourdois/OCR-nvidia-Nemotron-VLM-Dataset-v2_wiki_fr-clean

Viewer • Updated Nov 2 • 200k • 206

French table-to-text datasets

In 2021 before the release of LoRA, I was interested in Prefix-tuning, which I wanted to apply to French. So I had to translate table-to-text data

CATIE-AQ/web_nlg_french

Viewer • Updated Jul 11 • 35.4k • 64
CATIE-AQ/e2e_nlg_french

Viewer • Updated Jul 11 • 33.5k • 31
CATIE-AQ/viggo_french

Viewer • Updated Jul 11 • 5.1k • 59

French Translations

Things I've translated: courses, blog posts, guides. More on my personal blog (https://lbourdois.github.io/blog/).

Running

4

Free online AI courses in French

📚

4

French translations of four AI courses
lbourdois/en-fr-nyu-dl-course-corpus

Viewer • Updated Jul 15 • 3.13k • 90 • 1
Sleeping

4

SSM Blog Posts

📝

4

Blog posts about State Space Models (SSM)
Running

2

Guide sur l'évaluation des LLM

⚖

2

Traduction du guide de Clémentine Fourrier

Breton packs

Breton ressources (datasets & models) I developped to empower use cases in Breton

BR - Archive sonore en breton

Collection

⚠ En cours d'upload. Les demandes d'accès ne seront traitées qu'une fois la mise en ligne terminée. Actuellement 13 228h 12min 30s d'audio bruts. • 30 items • Updated Jun 7 • 1
BR - Audio (pré-entraînement)

Collection

Liste de ressources représentant 13 846h 51m 16s d'audios bruts en breton • 39 items • Updated Jun 7
BR - Audio (finetuning ASR en breton)

Collection

Des audios avec leurs transcriptions en breton. • 10 items • Updated Jun 15 • 1
Bretagne/whisper-large-v3-turbo-audio_breton-transcription_breton

Automatic Speech Recognition • 0.8B • Updated Aug 12 • 29 • 1

French QA

QA models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 150,000 downloads.

CATIE-AQ/QAmemberta

Question Answering • 0.1B • Updated Nov 26, 2024 • 21 • 1
CATIE-AQ/ModernQAmembert

Question Answering • 0.1B • Updated Apr 22 • 5
CATIE-AQ/QAmembert-large

Question Answering • 0.3B • Updated Nov 26, 2024 • 244 • 13
CATIE-AQ/QAmembert

Question Answering • 0.1B • Updated Nov 26, 2024 • 169 • 14

French embedding datasets

French datasets to train embeddings models or evaluate them.

CATIE-AQ/frenchSTS

Viewer • Updated Jul 15 • 45.7k • 82 • 1
CATIE-AQ/frenchNLI

Viewer • Updated Jul 29 • 570k • 69 • 1
NanoBEIR-fr 🍺

Collection

French translation of zeta-alpha-ai's NanoBEIR collection • 13 items • Updated Oct 31 • 2

French caption datasets

Datasets I cleaned with an image, a prompt question (like "describe this image") and an answer. Can be used to train VLMs.

lbourdois/caption-maya-multimodal-pretrain-clean

Viewer • Updated Jul 15 • 551k • 455
CATIE-AQ/caption-vidore-vdsid_french-clean

Viewer • Updated Jul 15 • 5k • 65
CATIE-AQ/caption-vidore-tabfquad_test_subsampled-clean

Viewer • Updated Jul 15 • 280 • 45
CATIE-AQ/caption-floschne-xm3600-clean

Viewer • Updated Jul 15 • 8.56k • 30

French retriever datasets

Datasets I cleaned with an image and a question. Can be used to train visual retrievers (ColPali and co.).

CATIE-AQ/retriever-vidore-vdsid_french-clean

Viewer • Updated Jul 15 • 5k • 87
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean

Viewer • Updated Jul 15 • 280 • 39
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean

Viewer • Updated Jul 15 • 1.83k • 63
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean

Viewer • Updated Jul 15 • 1.32k • 34

French audio datasets (pretraining)

Around 117K hours of audio in French for research purpose

lbourdois/radios_et_podcasts_en_ligne_0

Updated Jun 3 • 7
lbourdois/radios_et_podcasts_en_ligne_1

Preview • Updated Jun 3 • 3
lbourdois/radios_et_podcasts_en_ligne_2

Preview • Updated Jun 3 • 2
lbourdois/radios_et_podcasts_en_ligne_3

Preview • Updated Jun 3 • 3

French packs

French ressources (datasets & models) I developped to empower use cases in French

French prompts datasets

Collection

French prompts dataset developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 30,000 downloads. • 3 items • Updated Aug 8
French NER

Collection

NER models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 170,000 downloads. • 11 items • Updated Aug 8
French QA

Collection

QA models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 150,000 downloads. • 6 items • Updated Aug 8
French VQA datasets

Collection

VQA datasets I cleaned with an image, a question and an answer. Can be used to train VLMs. • 12 items • Updated Aug 8

French Translations

Things I've translated: courses, blog posts, guides. More on my personal blog (https://lbourdois.github.io/blog/).

Running

4

Free online AI courses in French

📚

4

French translations of four AI courses
lbourdois/en-fr-nyu-dl-course-corpus

Viewer • Updated Jul 15 • 3.13k • 90 • 1
Sleeping

4

SSM Blog Posts

📝

4

Blog posts about State Space Models (SSM)
Running

2

Guide sur l'évaluation des LLM

⚖

2

Traduction du guide de Clémentine Fourrier

FAT5

Flash Attention T5 (FAT5) models developped when I worked at CATIE (https://hf.co/CATIE-AQ).

Running

9

FAT5 (Flash Attention T5) report

⚡

9

English version of the blog post introducing FAT5 model
Running

Le FAT5 : Flash Attention T5

⚡

French version of the blog post introducing FAT5 model
CATIE-AQ/FAT5-small

0.1B • Updated Mar 17 • 13 • 2

Breton packs

Breton ressources (datasets & models) I developped to empower use cases in Breton

BR - Archive sonore en breton

Collection

⚠ En cours d'upload. Les demandes d'accès ne seront traitées qu'une fois la mise en ligne terminée. Actuellement 13 228h 12min 30s d'audio bruts. • 30 items • Updated Jun 7 • 1
BR - Audio (pré-entraînement)

Collection

Liste de ressources représentant 13 846h 51m 16s d'audios bruts en breton • 39 items • Updated Jun 7
BR - Audio (finetuning ASR en breton)

Collection

Des audios avec leurs transcriptions en breton. • 10 items • Updated Jun 15 • 1
Bretagne/whisper-large-v3-turbo-audio_breton-transcription_breton

Automatic Speech Recognition • 0.8B • Updated Aug 12 • 29 • 1

French NER

NER models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 170,000 downloads.

CATIE-AQ/Moderncamembert_3entities

Token Classification • 0.1B • Updated Apr 22 • 9 • 1
CATIE-AQ/NERmemberta-3entities

Token Classification • 0.1B • Updated Dec 5, 2024 • 33 • 1
CATIE-AQ/NERmembert-base-3entities

Token Classification • 0.1B • Updated Nov 26, 2024 • 75 • 2
CATIE-AQ/NERmembert-large-3entities

Token Classification • 0.3B • Updated Nov 26, 2024 • 93 • 2

French QA

QA models & datasets developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 150,000 downloads.

CATIE-AQ/QAmemberta

Question Answering • 0.1B • Updated Nov 26, 2024 • 21 • 1
CATIE-AQ/ModernQAmembert

Question Answering • 0.1B • Updated Apr 22 • 5
CATIE-AQ/QAmembert-large

Question Answering • 0.3B • Updated Nov 26, 2024 • 244 • 13
CATIE-AQ/QAmembert

Question Answering • 0.1B • Updated Nov 26, 2024 • 169 • 14

French prompts datasets

French prompts dataset developped when I worked at CATIE (https://hf.co/CATIE-AQ). Over 30,000 downloads.

CATIE-AQ/DFP

Viewer • Updated Nov 3 • 108M • 2.89k • 8
CATIE-AQ/facebook-community-alignment-dataset_french_dpo

Viewer • Updated Jul 29 • 71.5k • 47 • 1
CATIE-AQ/everyday-conversations-llama3.1-2k-in-french

Viewer • Updated Jul 31 • 2.38k • 50 • 1

French embedding datasets

French datasets to train embeddings models or evaluate them.

CATIE-AQ/frenchSTS

Viewer • Updated Jul 15 • 45.7k • 82 • 1
CATIE-AQ/frenchNLI

Viewer • Updated Jul 29 • 570k • 69 • 1
NanoBEIR-fr 🍺

Collection

French translation of zeta-alpha-ai's NanoBEIR collection • 13 items • Updated Oct 31 • 2

French VQA datasets

VQA datasets I cleaned with an image, a question and an answer. Can be used to train VLMs.

CATIE-AQ/VQA-cmarkea-table-vqa-clean

Viewer • Updated Jul 15 • 84.1k • 225
CATIE-AQ/VQA-cmarkea-doc-vqa-clean

Viewer • Updated Jul 15 • 60.9k • 736
lbourdois/VQA-neulab-CulturalGround-clean

Viewer • Updated Aug 12 • 1.71M • 3.43k • 1
lbourdois/VQA-worldcuisines-vqa-clean

Viewer • Updated Jul 15 • 38.4k • 191

French caption datasets

Datasets I cleaned with an image, a prompt question (like "describe this image") and an answer. Can be used to train VLMs.

lbourdois/caption-maya-multimodal-pretrain-clean

Viewer • Updated Jul 15 • 551k • 455
CATIE-AQ/caption-vidore-vdsid_french-clean

Viewer • Updated Jul 15 • 5k • 65
CATIE-AQ/caption-vidore-tabfquad_test_subsampled-clean

Viewer • Updated Jul 15 • 280 • 45
CATIE-AQ/caption-floschne-xm3600-clean

Viewer • Updated Jul 15 • 8.56k • 30

French OCR datasets

Datasets I cleaned with an image, a prompt question (like "transcribe the text in this image") and an answer. Can be used to train VLMs.

lbourdois/OCR-neulab-PangeaInstruct-OCR-clean

Viewer • Updated Jul 15 • 30k • 227
lbourdois/OCR-liboaccn-OPUS-MIT-5M-clean

Viewer • Updated Jul 15 • 530k • 571
lbourdois/OCR-nvidia-Nemotron-VLM-Dataset-v2_wiki_fr-clean

Viewer • Updated Nov 2 • 200k • 206

French retriever datasets

Datasets I cleaned with an image and a question. Can be used to train visual retrievers (ColPali and co.).

CATIE-AQ/retriever-vidore-vdsid_french-clean

Viewer • Updated Jul 15 • 5k • 87
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean

Viewer • Updated Jul 15 • 280 • 39
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean

Viewer • Updated Jul 15 • 1.83k • 63
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean

Viewer • Updated Jul 15 • 1.32k • 34