International audienceIf NMT has proven to be the most efficient solution for normalising pre-orthographic texts, the amount of training data required remains an obstacle. In this paper, we address for the first time the case of normalising modern French and we propose a workflow to create the parallel corpus that an NMT solution requires
Text normalization methods have been commonly applied to historical language or user-generated conte...
In this paper we describe the construction of a paral-lel corpus between the standard and a non-stan...
One of the most persistent characteristics of written user-generated content (UGC) is the use of non...
International audienceIf NMT has proven to be the most efficient solution for normalising pre-orthog...
International audienceSpelling normalisation is a useful step in the study and analysis of historica...
International audienceThe study of old state of languages is facing a double problem : on the one ha...
International audienceThe creation of text corpora requires a sequence of processing steps in order ...
International audienceThe creation of text corpora requires a sequence of processing steps in order ...
International audienceWe investigate the creation of a 17th c. French literary corpus. We present th...
The creation of text corpora requires a sequence of processing steps in order to constitute, normali...
International audienceOur digital resource of 88,000 anonymised French text messages, the 88milSMS c...
We investigate the creation of a 17th c. French literary corpus. We present the main options regardi...
International audienceLinguistic change in 17th c. France: new scriptometric approaches The end of t...
The work reported in this paper consisted in the creation of an automatic normalization tool for non...
With the development of big corpora of various periods, it becomescrucial to standardise linguistic ...
Text normalization methods have been commonly applied to historical language or user-generated conte...
In this paper we describe the construction of a paral-lel corpus between the standard and a non-stan...
One of the most persistent characteristics of written user-generated content (UGC) is the use of non...
International audienceIf NMT has proven to be the most efficient solution for normalising pre-orthog...
International audienceSpelling normalisation is a useful step in the study and analysis of historica...
International audienceThe study of old state of languages is facing a double problem : on the one ha...
International audienceThe creation of text corpora requires a sequence of processing steps in order ...
International audienceThe creation of text corpora requires a sequence of processing steps in order ...
International audienceWe investigate the creation of a 17th c. French literary corpus. We present th...
The creation of text corpora requires a sequence of processing steps in order to constitute, normali...
International audienceOur digital resource of 88,000 anonymised French text messages, the 88milSMS c...
We investigate the creation of a 17th c. French literary corpus. We present the main options regardi...
International audienceLinguistic change in 17th c. France: new scriptometric approaches The end of t...
The work reported in this paper consisted in the creation of an automatic normalization tool for non...
With the development of big corpora of various periods, it becomescrucial to standardise linguistic ...
Text normalization methods have been commonly applied to historical language or user-generated conte...
In this paper we describe the construction of a paral-lel corpus between the standard and a non-stan...
One of the most persistent characteristics of written user-generated content (UGC) is the use of non...