Text normalization is the task of mapping non-canonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. It is an up-stream task necessary to enable the subsequent direct employment of standard natural language processing tools and indispensable for languages such as Swiss German, with strong regional variation and no written standard. Text normalization has been addressed with a variety of methods, most successfully with character-level statistical machine translation (CSMT). In the meantime, machine translation has changed and the new methods, known as neural encoder-decoder (ED) models, resulted in remarkable improvements. Text normalization, however, has not yet followed. A number...
Historical text normalization, the task of mapping historical word forms to their modern counterpart...
International audienceThe creation of text corpora requires a sequence of processing steps in order ...
Social media texts have become one of the most used forms of written language and a valuable source ...
Text normalization is the task of mapping non-canonical language, typical of speech transcription an...
Text normalization is the task of mapping noncanonical language, typical of speech transcription and...
One of the most persistent characteristics of written user-generated content (UGC) is the use of non...
To study and automatically process Swiss German, it is necessary to resolve the issue of variation i...
Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in ev...
This is an accepted manuscript of an article published by IEEE in 2018 3rd International Conference ...
Text normalization methods have been commonly applied to historical language or user-generated conte...
The goal of this work is to design a machine translation (MT) system for a low-resource family of di...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
Historische Dokumente werden zunehmend in digitalisierter Form verfügbar gemacht. Häufig sind sie je...
This paper proposes an architecture, based on statistical machine translation, for developing the te...
Lexical normalization is the task of transforming an utterance into its standardized form. This task...
Historical text normalization, the task of mapping historical word forms to their modern counterpart...
International audienceThe creation of text corpora requires a sequence of processing steps in order ...
Social media texts have become one of the most used forms of written language and a valuable source ...
Text normalization is the task of mapping non-canonical language, typical of speech transcription an...
Text normalization is the task of mapping noncanonical language, typical of speech transcription and...
One of the most persistent characteristics of written user-generated content (UGC) is the use of non...
To study and automatically process Swiss German, it is necessary to resolve the issue of variation i...
Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in ev...
This is an accepted manuscript of an article published by IEEE in 2018 3rd International Conference ...
Text normalization methods have been commonly applied to historical language or user-generated conte...
The goal of this work is to design a machine translation (MT) system for a low-resource family of di...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
Historische Dokumente werden zunehmend in digitalisierter Form verfügbar gemacht. Häufig sind sie je...
This paper proposes an architecture, based on statistical machine translation, for developing the te...
Lexical normalization is the task of transforming an utterance into its standardized form. This task...
Historical text normalization, the task of mapping historical word forms to their modern counterpart...
International audienceThe creation of text corpora requires a sequence of processing steps in order ...
Social media texts have become one of the most used forms of written language and a valuable source ...