Recent research on the TIMIT database suggests that longerlength acoustic units are better suited for modelling pronunciation variation and long-term temporal dependencies in speech than traditional phoneme-length units, yielding substantial improvements in recognition accuracy [9]. In this paper, we investigate whether similar improvements can be gained on another database, viz. excerpts from novels in a Dutch library for the blind. We use a hierarchical method that employs a mixture of word-, syllable- and phoneme-length units. Our results show that the approach does increase the word accuracy, but to a lesser extent than expected. The paper discusses possible explanations for the finding
The phonetic word is of crucial importance for continuous speech recognition. This is because the wo...
In general the aim of an automatic speech recognition system is to write down what is said. State of...
Generally speaking, the speaker-dependence of a speech recognition system stems from speaker-depende...
Recent research on the TIMIT corpus suggests that longer-length acoustic units are better suited for...
Recent research on the TIMIT corpus suggests that longerlength acoustic units are better suited for ...
Recent research on the TIMIT corpus suggests that longer-length acoustic models are more appropriat...
There is now considerable evidence from psycholinguistic and phonetic research that fine-phonetic va...
Including information distributed over intervals of syllabic duration (100--250 ms) may greatly impr...
Transforming an acoustic signal to words is the gold standard in automatic speech recognition. Whil...
The large pronunciation variability of words in conversational speech is one of the major causes of ...
Many models of spoken word recognition posit the existence of lexical and sublexical representations...
Studies from multiple disciplines show that spectro-temporal units of natural languages and human sp...
Several models of spoken word recognition postulate that recognition is achieved via a process of co...
In pursuance of better performance, current speech recognition systems tend to use more and more com...
International audienceThis article analyzes the phonetic decoding performance obtained with differen...
The phonetic word is of crucial importance for continuous speech recognition. This is because the wo...
In general the aim of an automatic speech recognition system is to write down what is said. State of...
Generally speaking, the speaker-dependence of a speech recognition system stems from speaker-depende...
Recent research on the TIMIT corpus suggests that longer-length acoustic units are better suited for...
Recent research on the TIMIT corpus suggests that longerlength acoustic units are better suited for ...
Recent research on the TIMIT corpus suggests that longer-length acoustic models are more appropriat...
There is now considerable evidence from psycholinguistic and phonetic research that fine-phonetic va...
Including information distributed over intervals of syllabic duration (100--250 ms) may greatly impr...
Transforming an acoustic signal to words is the gold standard in automatic speech recognition. Whil...
The large pronunciation variability of words in conversational speech is one of the major causes of ...
Many models of spoken word recognition posit the existence of lexical and sublexical representations...
Studies from multiple disciplines show that spectro-temporal units of natural languages and human sp...
Several models of spoken word recognition postulate that recognition is achieved via a process of co...
In pursuance of better performance, current speech recognition systems tend to use more and more com...
International audienceThis article analyzes the phonetic decoding performance obtained with differen...
The phonetic word is of crucial importance for continuous speech recognition. This is because the wo...
In general the aim of an automatic speech recognition system is to write down what is said. State of...
Generally speaking, the speaker-dependence of a speech recognition system stems from speaker-depende...