New model architecture: DistilBERT Adding Huggingface's new transformer architecture, DistilBERT described in Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT by Victor Sanh, Lysandre Debut and Thomas Wolf. This new model architecture comes with two pretrained checkpoints: distilbert-base-uncased: the base DistilBert model distilbert-base-uncased-distilled-squad: DistilBert model fine-tuned with distillation on SQuAD. An awaited new pretrained checkpoint: GPT-2 large (774M parameters) The third OpenAI GPT-2 checkpoint (GPT-2 large) is available in the library under the shortcut name gpt2-large: 774M parameters, 36 layers, and 20 heads. New XLM multilingual pretrained checkpoints in 17 and 100 language...
New NLLB is now available, allowing you to translate up to 200 languages. It is also much faster. N...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
We introduce FLOTA (Few Longest Token Approximation), a simple yet effective method to improve the t...
New model architectures: CTRL, DistilGPT-2 Two new models have been added since release 2.0. CTRL (...
New model architectures: ALBERT, CamemBERT, GPT2-XL, DistilRoberta Four new models have been added s...
Rust tokenizers (@mfuntowicz, @n1t0 ) Tokenizers for Bert, Roberta, OpenAI GPT, OpenAI GPT2, Transf...
FlauBERT, MMBT MMBT was added to the list of available models, as the first multi-modal model to ma...
New Model: BART (added by @sshleifer) Bart is one of the first Seq2Seq models in the library, and ac...
Trainer & TFTrainer Version 2.9 introduces a new Trainer class for PyTorch, and its equivalent TFTra...
New class Pipeline (beta): easily run and use models on down-stream NLP tasks We have added a new cl...
Marian (@sshleifer) A new model architecture, MarianMTModel with 1,008+ pretrained weights is avail...
Longformer Longformer (@ibeltagy) Longformer for QA (@patil-suraj + @patrickvonplaten) Longformer f...
T5 Model (@patrickvonplaten, @thomwolf ) T5 is a powerful encoder-decoder model that formats every N...
Better backward-compatibility for tokenizers following v3.0.0 refactoring Version v3.0.0, included a...
Name change: welcome Transformers Following the extension to TensorFlow 2.0, pytorch-transformers =...
New NLLB is now available, allowing you to translate up to 200 languages. It is also much faster. N...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
We introduce FLOTA (Few Longest Token Approximation), a simple yet effective method to improve the t...
New model architectures: CTRL, DistilGPT-2 Two new models have been added since release 2.0. CTRL (...
New model architectures: ALBERT, CamemBERT, GPT2-XL, DistilRoberta Four new models have been added s...
Rust tokenizers (@mfuntowicz, @n1t0 ) Tokenizers for Bert, Roberta, OpenAI GPT, OpenAI GPT2, Transf...
FlauBERT, MMBT MMBT was added to the list of available models, as the first multi-modal model to ma...
New Model: BART (added by @sshleifer) Bart is one of the first Seq2Seq models in the library, and ac...
Trainer & TFTrainer Version 2.9 introduces a new Trainer class for PyTorch, and its equivalent TFTra...
New class Pipeline (beta): easily run and use models on down-stream NLP tasks We have added a new cl...
Marian (@sshleifer) A new model architecture, MarianMTModel with 1,008+ pretrained weights is avail...
Longformer Longformer (@ibeltagy) Longformer for QA (@patil-suraj + @patrickvonplaten) Longformer f...
T5 Model (@patrickvonplaten, @thomwolf ) T5 is a powerful encoder-decoder model that formats every N...
Better backward-compatibility for tokenizers following v3.0.0 refactoring Version v3.0.0, included a...
Name change: welcome Transformers Following the extension to TensorFlow 2.0, pytorch-transformers =...
New NLLB is now available, allowing you to translate up to 200 languages. It is also much faster. N...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
We introduce FLOTA (Few Longest Token Approximation), a simple yet effective method to improve the t...