New model architectures: ALBERT, CamemBERT, GPT2-XL, DistilRoberta Four new models have been added in v2.2.0 ALBERT (Pytorch & TF) (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. CamemBERT (Pytorch) (from Facebook AI Research, INRIA, and La Sorbonne Université), as the first large-scale Transformer language model. Released alongside the paper CamemBERT: a Tasty French Language Model by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la Clergerie, Djame Seddah, and...
Perceiver The Perceiver model was released in the previous version: Perceiver Eight new models are r...
In 2017, Vaswani et al. proposed a new neural network architecture named Transformer. That modern ar...
Transformer-based neural models are used in many AI applications. Training these models is expensive...
New model architectures: ALBERT, CamemBERT, GPT2-XL, DistilRoberta Four new models have been added s...
New model architecture: DistilBERT Adding Huggingface's new transformer architecture, DistilBERT des...
New model architectures: CTRL, DistilGPT-2 Two new models have been added since release 2.0. CTRL (...
FlauBERT, MMBT MMBT was added to the list of available models, as the first multi-modal model to ma...
Marian (@sshleifer) A new model architecture, MarianMTModel with 1,008+ pretrained weights is avail...
New Model: BART (added by @sshleifer) Bart is one of the first Seq2Seq models in the library, and ac...
Trainer & TFTrainer Version 2.9 introduces a new Trainer class for PyTorch, and its equivalent TFTra...
Rust tokenizers (@mfuntowicz, @n1t0 ) Tokenizers for Bert, Roberta, OpenAI GPT, OpenAI GPT2, Transf...
New class Pipeline (beta): easily run and use models on down-stream NLP tasks We have added a new cl...
Longformer Longformer (@ibeltagy) Longformer for QA (@patil-suraj + @patrickvonplaten) Longformer f...
Name change: welcome Transformers Following the extension to TensorFlow 2.0, pytorch-transformers =...
T5 Model (@patrickvonplaten, @thomwolf ) T5 is a powerful encoder-decoder model that formats every N...
Perceiver The Perceiver model was released in the previous version: Perceiver Eight new models are r...
In 2017, Vaswani et al. proposed a new neural network architecture named Transformer. That modern ar...
Transformer-based neural models are used in many AI applications. Training these models is expensive...
New model architectures: ALBERT, CamemBERT, GPT2-XL, DistilRoberta Four new models have been added s...
New model architecture: DistilBERT Adding Huggingface's new transformer architecture, DistilBERT des...
New model architectures: CTRL, DistilGPT-2 Two new models have been added since release 2.0. CTRL (...
FlauBERT, MMBT MMBT was added to the list of available models, as the first multi-modal model to ma...
Marian (@sshleifer) A new model architecture, MarianMTModel with 1,008+ pretrained weights is avail...
New Model: BART (added by @sshleifer) Bart is one of the first Seq2Seq models in the library, and ac...
Trainer & TFTrainer Version 2.9 introduces a new Trainer class for PyTorch, and its equivalent TFTra...
Rust tokenizers (@mfuntowicz, @n1t0 ) Tokenizers for Bert, Roberta, OpenAI GPT, OpenAI GPT2, Transf...
New class Pipeline (beta): easily run and use models on down-stream NLP tasks We have added a new cl...
Longformer Longformer (@ibeltagy) Longformer for QA (@patil-suraj + @patrickvonplaten) Longformer f...
Name change: welcome Transformers Following the extension to TensorFlow 2.0, pytorch-transformers =...
T5 Model (@patrickvonplaten, @thomwolf ) T5 is a powerful encoder-decoder model that formats every N...
Perceiver The Perceiver model was released in the previous version: Perceiver Eight new models are r...
In 2017, Vaswani et al. proposed a new neural network architecture named Transformer. That modern ar...
Transformer-based neural models are used in many AI applications. Training these models is expensive...