We investigate the creation of a 17th c. French literary corpus. We present the main options regarding available standards, the training data we created and the efficiency of the models produced for OCR, spelling normalization, and lemmatization – always with open-source solutions. We also present our encoding choices and the global logic of a corpus designed as a virtuous circle, enhancing automatically the tools that are used for its construction
The Corpus for Idiolectal Research (CIDRE) is a collection of fiction works from 11 prolific 19th-ce...
Etude lexicale et étymologique du vocabulaire de la Morocosmie (1583) de Josep Du Chesne (1544-1609)...
With the development of big corpora of various periods, it becomescrucial to standardise linguistic ...
International audienceWe investigate the creation of a 17th c. French literary corpus. We present th...
Machine learning begins with machine teaching: in the following paper, we present the data that we h...
International audienceLinguistic change in 17th c. France: new scriptometric approaches The end of t...
International audienceThe "Preclassical" French language period extends throughout the sixteenth cen...
8 pages, 2 figures, 4 tablesInternational audienceLanguage models for historical states of language ...
A corpus containing all digitized French novels from the beginning of print (the first entry is from...
This article analyses the constraints and methodological choices involved in processing a pre-modern...
The texts printed during the Fronde (“mazarinades”) and written in “burlesque” style are a set of do...
International audienceThe study of old state of languages is facing a double problem : on the one ha...
International audiencePresentation of a corpus of some 17th-century French plays that will be posted...
For explore the role of extended phraseology in the structuring of literary textual genres in mediev...
International audienceFrom the perspective of French tagged corpora, the period from the sixteenth t...
The Corpus for Idiolectal Research (CIDRE) is a collection of fiction works from 11 prolific 19th-ce...
Etude lexicale et étymologique du vocabulaire de la Morocosmie (1583) de Josep Du Chesne (1544-1609)...
With the development of big corpora of various periods, it becomescrucial to standardise linguistic ...
International audienceWe investigate the creation of a 17th c. French literary corpus. We present th...
Machine learning begins with machine teaching: in the following paper, we present the data that we h...
International audienceLinguistic change in 17th c. France: new scriptometric approaches The end of t...
International audienceThe "Preclassical" French language period extends throughout the sixteenth cen...
8 pages, 2 figures, 4 tablesInternational audienceLanguage models for historical states of language ...
A corpus containing all digitized French novels from the beginning of print (the first entry is from...
This article analyses the constraints and methodological choices involved in processing a pre-modern...
The texts printed during the Fronde (“mazarinades”) and written in “burlesque” style are a set of do...
International audienceThe study of old state of languages is facing a double problem : on the one ha...
International audiencePresentation of a corpus of some 17th-century French plays that will be posted...
For explore the role of extended phraseology in the structuring of literary textual genres in mediev...
International audienceFrom the perspective of French tagged corpora, the period from the sixteenth t...
The Corpus for Idiolectal Research (CIDRE) is a collection of fiction works from 11 prolific 19th-ce...
Etude lexicale et étymologique du vocabulaire de la Morocosmie (1583) de Josep Du Chesne (1544-1609)...
With the development of big corpora of various periods, it becomescrucial to standardise linguistic ...