CruciVerbi

community
Activity Feed

AI & ML interests

None defined yet.

Molbapย 
posted an update 2 months ago
view post
Post
3224
๐Ÿš€ New blog: Maintain the unmaintainable โ€“ 1M+ Python LOC, 400+ models

How do you stop a million-line library built by thousands of contributors from collapsing under its own weight?
At ๐Ÿค— Transformers, we do it with explicit software-engineering tenets, principles that make the codebase hackable at scale.

๐Ÿ” Inside the post:
โ€“ One Model, One File: readability first โ€” you can still open a modeling file and see the full logic, top to bottom.
โ€“ Modular Transformers: visible inheritance that cuts maintenance cost by ~15ร— while keeping models readable.
โ€“ Config-Driven Performance: FlashAttention, tensor parallelism, and attention scheduling are config-level features, not rewrites.

Written with @lysandre ,@pcuenq and @yonigozlan , this is a deep dive into how Transformers stays fast, open, and maintainable.

Read it here โ†’ transformers-community/Transformers-tenets

tense

#3 opened 2 months ago by
reach-vb

tense

#3 opened 2 months ago by
reach-vb

nits

1
#2 opened 2 months ago by
reach-vb

nits

1
#2 opened 2 months ago by
reach-vb

suggestions

1
#1 opened 2 months ago by
pcuenq

suggestions

1
#1 opened 2 months ago by
pcuenq
lysandreย 
posted an update 3 months ago
view post
Post
7097
We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !

v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.

Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago!
  • 6 replies
ยท
mcpotatoย 
posted an update 9 months ago
view post
Post
2689
Stoked to announce we've partnered with JFrog to continue improving safety on the Hub! ๐Ÿธ

Their model scanner brings new scanning capabilities to the table, aimed at reducing alert fatigue.

More on that in our blog post: https://huggingface.co/blog/jfrog
  • 1 reply
ยท
lysandreย 
posted an update 10 months ago
view post
Post
8207
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
ยท
ArthurZย 
posted an update about 1 year ago
view post
Post
5399
Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! ๐Ÿ”ฅ
pcuenqย 
posted an update over 1 year ago
view post
Post
10061
OpenELM in Core ML

Apple recently released a set of efficient LLMs in sizes varying between 270M and 3B parameters. Their quality, according to benchmarks, is similar to OLMo models of comparable size, but they required half the pre-training tokens because they use layer-wise scaling, where the number of attention heads increases in deeper layers.

I converted these models to Core ML, for use on Apple Silicon, using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084. The converted models were uploaded to this community in the Hub for anyone that wants to integrate inside their apps: corenet-community/openelm-core-ml-6630c6b19268a5d878cfd194

The conversion was done with the following parameters:
- Precision: float32.
- Sequence length: fixed to 128.

With swift-transformers (https://github.com/huggingface/swift-transformers), I'm getting about 56 tok/s with the 270M on my M1 Max, and 6.5 with the largest 3B model. These speeds could be improved by converting to float16. However, there's some precision loss somewhere and generation doesn't work in float16 mode yet. I'm looking into this and will keep you posted! Or take a look at this issue if you'd like to help: https://github.com/huggingface/swift-transformers/issues/95

I'm also looking at optimizing inference using an experimental kv cache in swift-transformers. It's a bit tricky because the layers have varying number of attention heads, but I'm curious to see how much this feature can accelerate performance in this model family :)

Regarding the instruct fine-tuned models, I don't know the chat template that was used. The models use the Llama 2 tokenizer, but the Llama 2 chat template, or the default Alignment Handbook one that was used to train, are not recognized. Any ideas on this welcome!
ยท
Molbapย 
posted an update over 1 year ago
view post
Post
5521
๐Ÿš€๐Ÿš€ Exciting times for the document AI community!

We're thrilled to announce the release of some of the largest OCR datasets available to the public.
๐Ÿ”ฅ With over 26 million pages , 18 billion text tokens, and 6TB of data, these resources are a significant leap forward for document AI research.

Here's how to access these datasets quickly:

from datasets import load_dataset

pdfa_dataset = load_dataset('pixparse/pdfa-eng-wds', streaming=True)
IDL_dataset = load_dataset('pixparse/idl-wds', streaming=True)

This enables you to stream them directly, integrating seamlessly with your projects using the Hugging Face datasets library. On the hub, you can find them here:

pixparse/pdfa-eng-wds
pixparse/idl-wds

For lean data loading, the new [chug](https://github.com/huggingface/chug) library offers a solution with pdf decoding:


import chug

task_cfg = chug.DataTaskDocReadCfg(
    page_sampling='all',
)
data_cfg = chug.DataCfg(
    source='pixparse/pdfa-eng-wds',
    split='train',
    batch_size=None,
    format='hfids',
    num_workers=0,
)
data_loader = chug.create_loader(
    data_cfg,
    task_cfg,
)
sample = next(iter(data_loader))



We owe a huge thank you to Peter Wyatt, Kate Tasker, Rachel Taketa, Ali Furkan Biten, Ruben Tito, and their colleagues for their contributions. Their work putting these datasets together has been invaluable. ๐Ÿค—

Looking Ahead:

We're on a mission to enhance document AI capabilities, and these datasets are just the beginning. With your engagement and innovation, we're confident in the community's ability to develop robust OCR solutions. We encourage you to explore these datasets, experiment with the code, and contribute to the collective progress in document AI.

For detailed information on usage and licensing, please refer to the dataset cards on the Hugging Face hub.
ยท
ArthurZย 
posted an update almost 2 years ago