Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition. Because LSEs are initialized by two pre-trained language-specific models (LSMs), the dual-encoder structure can exploit sufficient monolingual data and capture the individual language attributes. However, most existing methods have no language constraints on LSEs and underutilize language-specific knowledge of LSMs. In this paper, we propose a language-specific characteristic assistance (LSCA) method to mitigate the above problems. Specifically, during training, we introduce two language-specific losses as language constraints and generate corresponding language-specific targets for them. During decoding, we take the dec...
The recent development of neural network-based automatic speech recognition (ASR) systems has greatl...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
This work explores multilingual speech synthesis. We compare three models based on Tacotron that uti...
The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech r...
In this paper, we propose novel struc-tured language modeling methods for code mixing speech recogni...
In this paper, we propose novel struc-tured language modeling methods for code mixing speech recogni...
We propose a) a Language Agnostic end-to-end Speech Translation model (LAST), and b) a data augmenta...
We propose two end-to-end neural configurations for language diarization on bilingual code-switching...
In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (AS...
Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) ...
This research addresses the language model (LM) domain mismatch problem in automatic speech recognit...
In this study, we present improvements in N-best rescoring of code-switched speech achieved by n-gra...
ABSTRACT This paper describes the integration of language identification (LID) into a multilingual a...
One of the things that need to change when it comes to machine translation is the models' ability to...
Abstract: In this paper, phoneme sequences are used as language information to perform code-switched...
The recent development of neural network-based automatic speech recognition (ASR) systems has greatl...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
This work explores multilingual speech synthesis. We compare three models based on Tacotron that uti...
The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech r...
In this paper, we propose novel struc-tured language modeling methods for code mixing speech recogni...
In this paper, we propose novel struc-tured language modeling methods for code mixing speech recogni...
We propose a) a Language Agnostic end-to-end Speech Translation model (LAST), and b) a data augmenta...
We propose two end-to-end neural configurations for language diarization on bilingual code-switching...
In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (AS...
Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) ...
This research addresses the language model (LM) domain mismatch problem in automatic speech recognit...
In this study, we present improvements in N-best rescoring of code-switched speech achieved by n-gra...
ABSTRACT This paper describes the integration of language identification (LID) into a multilingual a...
One of the things that need to change when it comes to machine translation is the models' ability to...
Abstract: In this paper, phoneme sequences are used as language information to perform code-switched...
The recent development of neural network-based automatic speech recognition (ASR) systems has greatl...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
This work explores multilingual speech synthesis. We compare three models based on Tacotron that uti...