Spelling variation is one of the key challenges for NLP on historical texts, especially for non-standard languages where there is no consensus on the convention of spelling, it is not possible to normalize these texts. In this paper, we explored the research on the detection of spelling variation in order to develop a system for the extraction of rules of spelling variation in Alsatian dialects. Based on the corpus of dramas from the MeThAL project, we presented the method for extracting rules of variation at the level of character n-grams : first extracting the character n-grams from the sub-corpus by statistical measures, then clustering the feature forms, finally aligning the different clusters and extracting the rules of variation and e...
Analysis of English historical texts poses a number of obstacles for standard corpus analysis and an...
The vast corpus of 14th century charters, composed by Piet van Reenen and Maaike Mulder will be used...
This corpus-based study focuses on the graphemic realisations of several derivational suffixes in t...
Spelling variation is one of the key challenges for NLP on historical texts, especially for non-stan...
International audienceThis article presents new pronunciation dictionaries for the under-resourced A...
International audienceAt the MeThAl project, we are creating the first large TEI corpus of Alsatian ...
In this article, we describe the respective approaches we have taken when addressing issues of spell...
In this paper, we present experiments on POS tagging historical texts that contain spelling variatio...
The topic of this bachelor thesis is the orthography of French lexicon in contemporary German. Selec...
(English): The topic of this bachelor thesis is the orthography of French lexicon in contemporary Ge...
Large quantities of spelling variation in corpora, such as that found in Early Modern English, can c...
The paper explores trends in spelling variation as reflected in Early English correspondence (15th–1...
International audienceLinguistic change in 17th c. France: new scriptometric approaches The end of t...
The extraction of relevant information in texts constitutes a fundamental process of text mining. We...
Medieval manuscripts or other written documents from that period contain valuable information about ...
Analysis of English historical texts poses a number of obstacles for standard corpus analysis and an...
The vast corpus of 14th century charters, composed by Piet van Reenen and Maaike Mulder will be used...
This corpus-based study focuses on the graphemic realisations of several derivational suffixes in t...
Spelling variation is one of the key challenges for NLP on historical texts, especially for non-stan...
International audienceThis article presents new pronunciation dictionaries for the under-resourced A...
International audienceAt the MeThAl project, we are creating the first large TEI corpus of Alsatian ...
In this article, we describe the respective approaches we have taken when addressing issues of spell...
In this paper, we present experiments on POS tagging historical texts that contain spelling variatio...
The topic of this bachelor thesis is the orthography of French lexicon in contemporary German. Selec...
(English): The topic of this bachelor thesis is the orthography of French lexicon in contemporary Ge...
Large quantities of spelling variation in corpora, such as that found in Early Modern English, can c...
The paper explores trends in spelling variation as reflected in Early English correspondence (15th–1...
International audienceLinguistic change in 17th c. France: new scriptometric approaches The end of t...
The extraction of relevant information in texts constitutes a fundamental process of text mining. We...
Medieval manuscripts or other written documents from that period contain valuable information about ...
Analysis of English historical texts poses a number of obstacles for standard corpus analysis and an...
The vast corpus of 14th century charters, composed by Piet van Reenen and Maaike Mulder will be used...
This corpus-based study focuses on the graphemic realisations of several derivational suffixes in t...