Access to large pre-trained models of varied architectures, in many different languages, is central to the democratization of NLP. We introduce PAGnol, a collection of French GPT models. Using scaling laws, we efficiently train PAGnol-XL (1.5B parameters) with the same computational budget as CamemBERT, a model 13 times smaller. PAGnol-XL is the largest model trained to date for the French language. We plan to train increasingly large and performing versions of PAGnol, exploring the capabilities of French extreme-scale models. For this first release, we focus on the pre-training and scaling calculations underlining PAGnol. We fit a scaling law for compute for the French language, and compare it with its English counterpart. We find the pre-...
International audienceIn the last five years, the rise of the self-attentional Transformerbased arch...
This paper presents methods to combine large language models trained from diverse text sources and a...
Artificial neural networks represent an HPC workload with increasing importance. In particular the f...
Access to large pre-trained models of varied architectures, in many different languages, is central ...
<p>Recent advances in NLP have significantly improved the performance of language models on a ...
Web site: https://camembert-model.frPretrained language models are now ubiquitous in Natural Languag...
International audienceLanguage models have become a key step to achieve state-of-the art results in ...
Language models have become a key step to achieve state-of-the art results in many NLP tasks. Levera...
International audienceDistributed word representations are popularly used in many tasks in natural l...
Modern Natural Language Processing (NLP) models based on Transformer structures represent the state ...
We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from t...
We evaluate statistical parsing of French using two probabilistic models derived from the Tree Adjoi...
In the last five years, the rise of the self-attentional Transformer-based architectures led to stat...
In recent years, neural methods for Natural Language Processing (NLP) have consistently and repeated...
Many medical applications are envisioned for Large Language Models (LLMs), such as the automated sum...
International audienceIn the last five years, the rise of the self-attentional Transformerbased arch...
This paper presents methods to combine large language models trained from diverse text sources and a...
Artificial neural networks represent an HPC workload with increasing importance. In particular the f...
Access to large pre-trained models of varied architectures, in many different languages, is central ...
<p>Recent advances in NLP have significantly improved the performance of language models on a ...
Web site: https://camembert-model.frPretrained language models are now ubiquitous in Natural Languag...
International audienceLanguage models have become a key step to achieve state-of-the art results in ...
Language models have become a key step to achieve state-of-the art results in many NLP tasks. Levera...
International audienceDistributed word representations are popularly used in many tasks in natural l...
Modern Natural Language Processing (NLP) models based on Transformer structures represent the state ...
We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from t...
We evaluate statistical parsing of French using two probabilistic models derived from the Tree Adjoi...
In the last five years, the rise of the self-attentional Transformer-based architectures led to stat...
In recent years, neural methods for Natural Language Processing (NLP) have consistently and repeated...
Many medical applications are envisioned for Large Language Models (LLMs), such as the automated sum...
International audienceIn the last five years, the rise of the self-attentional Transformerbased arch...
This paper presents methods to combine large language models trained from diverse text sources and a...
Artificial neural networks represent an HPC workload with increasing importance. In particular the f...