In this paper, a new methodology for speech corpora definition from internet documents is described, in order to record a large speech database, dedicated to the training and testing of acoustic models for speech recognition. In the first section, the Web robot which is in charge of collecting Web pages from Internet is presented, then the web text to French sentences filtering mechanism is explained. Some information about the corpus organization (90 % for training and 10 % for test) is given. In the third section, the phoneme distribution of the corpus is presented and comparison is made with others French language studies. Finally tools and planning for recording the speech database with more than one hundred speakers are described. 1
International audienceThe construction of a speech recognition system requires a recorded set of phr...
The Bavarian Archive for Speech Signals has released three new speech corpora for both industrial an...
International audienceCorpora and web corpora. Reflection on the essence of digital corpora. This pa...
International audienceIn this paper, a new methodology for speech corpora definition from internet d...
In statistical language modelling researches, there is a lack of huge text corpora, especially for s...
International audienceSpoken language speech recognition systems need better understanding of natura...
The three pillars of an automatic speech recognition system are the lexicon, the languagemodel and t...
This paper presents the results of the NEOLOGOS project: a children database and an optimized adult ...
Language models used in current automatic speech recognition systems are trained on general-purpose ...
This paper describes the setting up of a resource database for research and evaluation in the domain...
The WWW is a ubiquitous, mature communication infrastruc-ture for business and scientific informatio...
International audienceThis paper discusses the adaptation of speech recognition vocabularies for aut...
This paper describes methods that exploit stenographic transcripts of the German parliament to impro...
International audienceLanguage registers are a strongly perceptible characteristic of texts and spee...
The Bavarian Archive for Speech Signals has released three new speech corpora for both industrial an...
International audienceThe construction of a speech recognition system requires a recorded set of phr...
The Bavarian Archive for Speech Signals has released three new speech corpora for both industrial an...
International audienceCorpora and web corpora. Reflection on the essence of digital corpora. This pa...
International audienceIn this paper, a new methodology for speech corpora definition from internet d...
In statistical language modelling researches, there is a lack of huge text corpora, especially for s...
International audienceSpoken language speech recognition systems need better understanding of natura...
The three pillars of an automatic speech recognition system are the lexicon, the languagemodel and t...
This paper presents the results of the NEOLOGOS project: a children database and an optimized adult ...
Language models used in current automatic speech recognition systems are trained on general-purpose ...
This paper describes the setting up of a resource database for research and evaluation in the domain...
The WWW is a ubiquitous, mature communication infrastruc-ture for business and scientific informatio...
International audienceThis paper discusses the adaptation of speech recognition vocabularies for aut...
This paper describes methods that exploit stenographic transcripts of the German parliament to impro...
International audienceLanguage registers are a strongly perceptible characteristic of texts and spee...
The Bavarian Archive for Speech Signals has released three new speech corpora for both industrial an...
International audienceThe construction of a speech recognition system requires a recorded set of phr...
The Bavarian Archive for Speech Signals has released three new speech corpora for both industrial an...
International audienceCorpora and web corpora. Reflection on the essence of digital corpora. This pa...