To be able to profit from natural language processing (NLP) tools for analysing historical text, an important step is spelling normalisation. We first compare and second combine two different approaches: on the one hand VARD, a rule-based system which is based on dictionary lookup and rules with non-probabilistic but trainable weights; on the other hand a language-independent approach to spelling normalisation based on statistical machine translation (SMT) techniques. The rule-based system reaches the best accuracy, up to 94% precision at 74% recall, while the SMT system improves each tested period. We obtain the best system by combining both approaches. Re-training VARD on specific time-periods and domains is beneficial, and both systems b...
The development of (semi-)automatic tools such as the VARD (Baron and Rayson, 2008) has afforded com...
Language technology tools can be very use-ful for making information concealed in historical documen...
The lack of a spelling convention in historical documents makes their orthography to change dependin...
To be able to profit from natural language processing (NLP) tools for analysing historical text, an ...
To be able to use existing natural language processing tools for analysing historical text, an impor...
To be able to use existing natural language processing tools for analysing historical text, an impor...
Spelling normalization is the task to normalize non-standard words into standard words in texts, res...
Natural language processing for historical text imposes a variety of challenges, such as to deal wit...
Historical texts are an important resource for researchers in the humanities. However, standard NLP ...
This paper presents work on manual and semi-automatic normalization of historical language data. We ...
Corpora of Early Modern English have been collected and released for research for a number of years....
Large quantities of spelling variation in corpora, such as that found in Early Modern English, can c...
Corpora of Early Modern English have been collected and released for research for a number of years....
Historical text constitutes a rich source of information for historians and other researchers in hum...
Language technology tools can be very use- ful for making information concealed in historical docume...
The development of (semi-)automatic tools such as the VARD (Baron and Rayson, 2008) has afforded com...
Language technology tools can be very use-ful for making information concealed in historical documen...
The lack of a spelling convention in historical documents makes their orthography to change dependin...
To be able to profit from natural language processing (NLP) tools for analysing historical text, an ...
To be able to use existing natural language processing tools for analysing historical text, an impor...
To be able to use existing natural language processing tools for analysing historical text, an impor...
Spelling normalization is the task to normalize non-standard words into standard words in texts, res...
Natural language processing for historical text imposes a variety of challenges, such as to deal wit...
Historical texts are an important resource for researchers in the humanities. However, standard NLP ...
This paper presents work on manual and semi-automatic normalization of historical language data. We ...
Corpora of Early Modern English have been collected and released for research for a number of years....
Large quantities of spelling variation in corpora, such as that found in Early Modern English, can c...
Corpora of Early Modern English have been collected and released for research for a number of years....
Historical text constitutes a rich source of information for historians and other researchers in hum...
Language technology tools can be very use- ful for making information concealed in historical docume...
The development of (semi-)automatic tools such as the VARD (Baron and Rayson, 2008) has afforded com...
Language technology tools can be very use-ful for making information concealed in historical documen...
The lack of a spelling convention in historical documents makes their orthography to change dependin...