International audienceThis paper describes and compares the impact of different types and size of training corpora on language models like ELMO. By asking the fundamental question of quality versus quantity we evaluate four French corpora for training on parsing scores, POS-tagging and named-entities recognition downstream tasks. The paper studies the relevance of a new corpus, CaBeRnet, featuring a representative range of language usage, including a balanced variety of genres (oral transcriptions, newspapers, popular magazines, technical reports, fiction, academic texts), in oral and written styles. We hypothesize that a linguistically representative and balanced corpora will allow the language model to be more efficient and representative...
Very few gold standard annotated corpora are currently available for French. We present an ongoing p...
We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from t...
International audienceJohns and King (1991: iii) and Bernardini (2004: 16) consider corpora as helpf...
International audienceOld French parsing : Which language properties have the greatest influence on ...
International audienceThis work investigates a possibility of combining two different types of corpo...
International audienceThe successes of contextual word embeddings learned by training large-scale la...
This paper presents the current status of the French treebank developed at Paris 7 (Abeille ́ et al....
This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank su...
International audienceWe use the multilingual OSCAR corpus, extracted from Common Crawl via language...
In the last five years, the rise of the self-attentional Transformer-based architectures led to stat...
International audienceLanguage models have become a key step to achieve state-of-the art results in ...
In this paper, we investigate automatic tagging of French corpora and compare morpho-syntactic prope...
We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from t...
<p>Recent advances in NLP have significantly improved the performance of language models on a ...
The current dominance of deep neural networks in natural language processing is based on contextual ...
Very few gold standard annotated corpora are currently available for French. We present an ongoing p...
We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from t...
International audienceJohns and King (1991: iii) and Bernardini (2004: 16) consider corpora as helpf...
International audienceOld French parsing : Which language properties have the greatest influence on ...
International audienceThis work investigates a possibility of combining two different types of corpo...
International audienceThe successes of contextual word embeddings learned by training large-scale la...
This paper presents the current status of the French treebank developed at Paris 7 (Abeille ́ et al....
This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank su...
International audienceWe use the multilingual OSCAR corpus, extracted from Common Crawl via language...
In the last five years, the rise of the self-attentional Transformer-based architectures led to stat...
International audienceLanguage models have become a key step to achieve state-of-the art results in ...
In this paper, we investigate automatic tagging of French corpora and compare morpho-syntactic prope...
We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from t...
<p>Recent advances in NLP have significantly improved the performance of language models on a ...
The current dominance of deep neural networks in natural language processing is based on contextual ...
Very few gold standard annotated corpora are currently available for French. We present an ongoing p...
We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from t...
International audienceJohns and King (1991: iii) and Bernardini (2004: 16) consider corpora as helpf...