Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the articulatory properties of the input speech when performing enhancement to attain performance improvements. Thus, the contextual information of articulatory attributes has additional information that can further benefit SE. This study proposed an SE system that improved performance by optimizing contextual articulatory information in enhanced speech through joint training of the SE model with an end-to-end automatic speech recognition (E2E-ASR) model and predicting the sequence of broad phone classes (BPCs) instead of the phoneme/word sequences. We developed tw...
Phonological studies suggest that the typical subword units such as phones or phonemes used in autom...
A novel framework for automatic articulatory-acoustic feature extraction has been developed for enha...
Speech recognition has become common in many application domains, from dictation systems for profess...
We address the problem of reconstructing articulatory movements, given audio and/or phonetic labels....
This paper combines acoustic features with a high temporal and a high frequency resolution to reliab...
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR) models is a chal...
Kirchhoff K, Fink GA, Sagerer G. Combining acoustic and articulatory feature information for robust ...
Human speech processing is inherently multi-modal, where visual cues (lip movements) help better und...
Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech re...
Contextual information plays a crucial role in speech recognition technologies and incorporating it ...
It is often argued that acoustic-phonetic or articulatory features could be beneficial to automatic ...
Smart devices continue to proliferate, and they provide a variety of functions to assist end users. ...
The frame-synchronized framework has dominated many speech processing systems, such as ASR and AED t...
A novel framework for automatic articulatory-acoustic feature extraction has been developed for enha...
The past decade has seen phenomenal improvement in the performance of Automatic Speech Recognition (...
Phonological studies suggest that the typical subword units such as phones or phonemes used in autom...
A novel framework for automatic articulatory-acoustic feature extraction has been developed for enha...
Speech recognition has become common in many application domains, from dictation systems for profess...
We address the problem of reconstructing articulatory movements, given audio and/or phonetic labels....
This paper combines acoustic features with a high temporal and a high frequency resolution to reliab...
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR) models is a chal...
Kirchhoff K, Fink GA, Sagerer G. Combining acoustic and articulatory feature information for robust ...
Human speech processing is inherently multi-modal, where visual cues (lip movements) help better und...
Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech re...
Contextual information plays a crucial role in speech recognition technologies and incorporating it ...
It is often argued that acoustic-phonetic or articulatory features could be beneficial to automatic ...
Smart devices continue to proliferate, and they provide a variety of functions to assist end users. ...
The frame-synchronized framework has dominated many speech processing systems, such as ASR and AED t...
A novel framework for automatic articulatory-acoustic feature extraction has been developed for enha...
The past decade has seen phenomenal improvement in the performance of Automatic Speech Recognition (...
Phonological studies suggest that the typical subword units such as phones or phonemes used in autom...
A novel framework for automatic articulatory-acoustic feature extraction has been developed for enha...
Speech recognition has become common in many application domains, from dictation systems for profess...