Current leading mispronunciation detection and diagnosis (MDD) systems achieve promising performance via end-to-end phoneme recognition. One challenge of such end-to-end solutions is the scarcity of human-annotated phonemes on natural L2 speech. In this work, we leverage unlabeled L2 speech via a pseudo-labeling (PL) procedure and extend the fine-tuning approach based on pre-trained self-supervised learning (SSL) models. Specifically, we use Wav2vec 2.0 as our SSL model, and fine-tune it using original labeled L2 speech samples plus the created pseudo-labeled L2 speech samples. Our pseudo labels are dynamic and are produced by an ensemble of the online model on-the-fly, which ensures that our model is robust to pseudo label noise. We show t...
This thesis reports the investigations into the task of phone-level pronunciation error detection, t...
This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recog...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to...
Computer-Assisted Pronunciation Training (CAPT) plays an important role in language learning. Conven...
State-of-the-art automatic speech recognition (ASR) systems are trained with tens of thousands of ho...
The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to...
End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approac...
Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunc...
For self-supervised speaker verification, the quality of pseudo labels decides the upper bound of th...
This thesis reports the investigations into the task of phone-level pronunciation error detection, t...
Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pr...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
State-of-the-art speaker verification systems are inherently dependent on some kind of human supervi...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
This thesis reports the investigations into the task of phone-level pronunciation error detection, t...
This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recog...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to...
Computer-Assisted Pronunciation Training (CAPT) plays an important role in language learning. Conven...
State-of-the-art automatic speech recognition (ASR) systems are trained with tens of thousands of ho...
The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to...
End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approac...
Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunc...
For self-supervised speaker verification, the quality of pseudo labels decides the upper bound of th...
This thesis reports the investigations into the task of phone-level pronunciation error detection, t...
Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pr...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
State-of-the-art speaker verification systems are inherently dependent on some kind of human supervi...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
This thesis reports the investigations into the task of phone-level pronunciation error detection, t...
This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recog...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...