Speaker dependent (SD) ASR systems have significantly lower word error rates (WER) compared to speaker independent (SI) systems. However, SD systems require sufficient training data from the target speaker, which is impractical to collect in a short time. We present a technique for training SD models using just few minutes of speaker's data. We compensate for the lack of adequate speaker-specific data by selecting neighbours from a database of existing speakers who are acoustically close to the target speaker. These neighbours provide ample training data, which is used to adapt the SI model to obtain an initial SD model for the new speaker with significantly lower WER. We evaluate various neighbour selection algorithms on a large-scale medi...
LREC2006: the 5th international conference on Language Resources and Evaluation, May 2006.This paper...
Acoustic variability across speakers is one of the challenges of speaker independent speech recognit...
This paper compares schemes for the selection of multi-genre broadcast data and corresponding transc...
<p>Speaker dependent (SD) ASR systems have significantly lower word error rates (WER) compared to sp...
Linear regression based speaker adaptation approaches can improve Automatic Speech Recognition (ASR)...
Traditional text independent speaker recognition systems are based on Gaussian Mixture Models (GMMs)...
The performance of the speech recognition systems to translate voice to text is still an issue in la...
Automatic speech recognition (ASR) in the educational environment could be a solution to address the...
This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small a...
INTERSPEECH2007: 8th Annual Conference of the International Speech Communication Association, August...
LVCSR performance is consistently poor on low-proficiency non-native speech. While gains from speake...
Automatic speech recognition (ASR) technology has matured over the past few decades and has made sig...
Inter-speaker variation can be coped rather well in speech recognition by speaker adaptation techniq...
Self-supervised learning (SSL) has been able to leverage unlabeled data to boost the performance of ...
This paper investigates techniques to compensate for the effects of regional accents of British Engl...
LREC2006: the 5th international conference on Language Resources and Evaluation, May 2006.This paper...
Acoustic variability across speakers is one of the challenges of speaker independent speech recognit...
This paper compares schemes for the selection of multi-genre broadcast data and corresponding transc...
<p>Speaker dependent (SD) ASR systems have significantly lower word error rates (WER) compared to sp...
Linear regression based speaker adaptation approaches can improve Automatic Speech Recognition (ASR)...
Traditional text independent speaker recognition systems are based on Gaussian Mixture Models (GMMs)...
The performance of the speech recognition systems to translate voice to text is still an issue in la...
Automatic speech recognition (ASR) in the educational environment could be a solution to address the...
This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small a...
INTERSPEECH2007: 8th Annual Conference of the International Speech Communication Association, August...
LVCSR performance is consistently poor on low-proficiency non-native speech. While gains from speake...
Automatic speech recognition (ASR) technology has matured over the past few decades and has made sig...
Inter-speaker variation can be coped rather well in speech recognition by speaker adaptation techniq...
Self-supervised learning (SSL) has been able to leverage unlabeled data to boost the performance of ...
This paper investigates techniques to compensate for the effects of regional accents of British Engl...
LREC2006: the 5th international conference on Language Resources and Evaluation, May 2006.This paper...
Acoustic variability across speakers is one of the challenges of speaker independent speech recognit...
This paper compares schemes for the selection of multi-genre broadcast data and corresponding transc...