In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training. Previously proposed frameworks which conditionally factorize the bilingual task into its constituent monolingual parts are a promising starting point for leveraging monolingual data efficiently. However, these methods require the monolingual modules to perform language segmentation. That is, each monolingual module has to simultaneously detect CS points and transcribe speech segments of one language while ignoring those of other languages -- not a trivial task. We propose to simplify each monolingual module by allowing them to transcribe all speech s...
Rapid deployment of automatic speech recognition (ASR) in new languages, with very limited data, is ...
One of the things that need to change when it comes to machine translation is the models' ability to...
Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switchin...
The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech r...
We present a method for cross-lingual training an ASR system using absolutely no transcribed trainin...
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied t...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
The idea of combining multiple languages’ recordings to train a single automatic speech recognition ...
The recent development of neural network-based automatic speech recognition (ASR) systems has greatl...
We propose a) a Language Agnostic end-to-end Speech Translation model (LAST), and b) a data augmenta...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) ...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error pro...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Rapid deployment of automatic speech recognition (ASR) in new languages, with very limited data, is ...
One of the things that need to change when it comes to machine translation is the models' ability to...
Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switchin...
The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech r...
We present a method for cross-lingual training an ASR system using absolutely no transcribed trainin...
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied t...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
The idea of combining multiple languages’ recordings to train a single automatic speech recognition ...
The recent development of neural network-based automatic speech recognition (ASR) systems has greatl...
We propose a) a Language Agnostic end-to-end Speech Translation model (LAST), and b) a data augmenta...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) ...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error pro...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Rapid deployment of automatic speech recognition (ASR) in new languages, with very limited data, is ...
One of the things that need to change when it comes to machine translation is the models' ability to...
Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switchin...