This paper describes the text normalization module of a text to speech fully-trainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of three main modules: a tokenizer for splitting the text input into a token graph, a phrase-based translation module for token translation, and a post-processing module for removing some tokens. This architecture has been evaluated for number transcription in several languages: English, Spanish and Romanian. Number transcription is an important aspect ...
We report insights from translating Spanish conversational telephone speech into English text by cas...
Letter-to-phone conversion, as part of the natural language processing stage, plays a very important...
Text normalization is the task of mapping non-canonical language, typical of speech transcription an...
This paper describes the text normalization module of a text to speech fully-trainable conversion sy...
This paper proposes an architecture, based on statistical machine translation, for developing the te...
International audienceThe creation of text corpora requires a sequence of processing steps in order ...
This paper is devoted to the text normalization module in our text-to-speech synthesis system. We fo...
Text normalization methods have been commonly applied to historical language or user-generated conte...
The creation of text corpora requires a sequence of processing steps in order to constitute, normali...
With the growth of the social web, user-generated text data has reached unprecedented sizes. Non-can...
There are about 7,000 languages spoken today in the world. However, most natural language processing...
Most areas of natural language processing today make heavy use of automatic inference from large cor...
Includes bibliographical references (page 4).This paper describes a process of text normalization sy...
Short Messaging Service (SMS) texts be-have quite differently from normal written texts and have som...
This paper describes an approach to pre-process SMS text for Machine Translation. As SMS text behave...
We report insights from translating Spanish conversational telephone speech into English text by cas...
Letter-to-phone conversion, as part of the natural language processing stage, plays a very important...
Text normalization is the task of mapping non-canonical language, typical of speech transcription an...
This paper describes the text normalization module of a text to speech fully-trainable conversion sy...
This paper proposes an architecture, based on statistical machine translation, for developing the te...
International audienceThe creation of text corpora requires a sequence of processing steps in order ...
This paper is devoted to the text normalization module in our text-to-speech synthesis system. We fo...
Text normalization methods have been commonly applied to historical language or user-generated conte...
The creation of text corpora requires a sequence of processing steps in order to constitute, normali...
With the growth of the social web, user-generated text data has reached unprecedented sizes. Non-can...
There are about 7,000 languages spoken today in the world. However, most natural language processing...
Most areas of natural language processing today make heavy use of automatic inference from large cor...
Includes bibliographical references (page 4).This paper describes a process of text normalization sy...
Short Messaging Service (SMS) texts be-have quite differently from normal written texts and have som...
This paper describes an approach to pre-process SMS text for Machine Translation. As SMS text behave...
We report insights from translating Spanish conversational telephone speech into English text by cas...
Letter-to-phone conversion, as part of the natural language processing stage, plays a very important...
Text normalization is the task of mapping non-canonical language, typical of speech transcription an...