Abstract—Smartphones provide an efficient means for the collection of speech data; however, the quality of the corpora created in this fashion is not predictable. We describe an approach that allows us to post-process and rank utterances in a prompted speech corpus quickly and effectively. Utterance ranking makes it possible to both select those utterances with the highest likelihood of being correct and to evaluate the quality of the resulting corpus from a limited sample. This approach has been applied to a collection in the eleven official languages of South Africa, and we show that it naturally leads to the creation of stratified corpora from the same collection. Such corpora can be useful for different purposes, and corpus users are pr...
In this contribution, the design, collection, annotation and planned distribution of a new spoken la...
In this paper we apply speech recognition for automatic tran-script generation for spoken document r...
Summarization: We investigate algorithms and tools for the semi-automatic authoring of grammars for ...
AbstractThe official languages of South Africa can still be classified as under-resourced with respe...
This work was supported by the Department of Arts and Culture.The NCHLT speech corpus contains wide-...
Thesis (M.Ing. (Computer Engineering))--North-West University, Potchefstroom Campus, 2009.The rapid ...
Contains fulltext : 41404.pdf (publisher's version ) (Open Access)Each time a word...
The newest generation of speech technology caused a huge increase of audio-visual data nowadays bein...
South Africa has eleven official languages, ten of which are considered “resource-scarce”. For these...
Articulatory data offers promising developments in our understanding of speech production and advanc...
International audience— In this paper we present an integrated unsupervised method to produce a qual...
This thesis introduces a general method for using information at the utterance level and across utte...
Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campu...
Abstract: Problem statement: Overgeneration-and-ranking architecture works well in written language ...
In state-of-the-art large vocabulary automatic recognition systems, a large statistical language mod...
In this contribution, the design, collection, annotation and planned distribution of a new spoken la...
In this paper we apply speech recognition for automatic tran-script generation for spoken document r...
Summarization: We investigate algorithms and tools for the semi-automatic authoring of grammars for ...
AbstractThe official languages of South Africa can still be classified as under-resourced with respe...
This work was supported by the Department of Arts and Culture.The NCHLT speech corpus contains wide-...
Thesis (M.Ing. (Computer Engineering))--North-West University, Potchefstroom Campus, 2009.The rapid ...
Contains fulltext : 41404.pdf (publisher's version ) (Open Access)Each time a word...
The newest generation of speech technology caused a huge increase of audio-visual data nowadays bein...
South Africa has eleven official languages, ten of which are considered “resource-scarce”. For these...
Articulatory data offers promising developments in our understanding of speech production and advanc...
International audience— In this paper we present an integrated unsupervised method to produce a qual...
This thesis introduces a general method for using information at the utterance level and across utte...
Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campu...
Abstract: Problem statement: Overgeneration-and-ranking architecture works well in written language ...
In state-of-the-art large vocabulary automatic recognition systems, a large statistical language mod...
In this contribution, the design, collection, annotation and planned distribution of a new spoken la...
In this paper we apply speech recognition for automatic tran-script generation for spoken document r...
Summarization: We investigate algorithms and tools for the semi-automatic authoring of grammars for ...