Articulatory features (AFs) provide language-independent attribute by exploiting the speech production knowl-edge. This paper proposes a cross-lingual automatic speechrecognition (ASR) based on AF methods. Various neural network(NN) architectures are explored to extract cross-lingual AFs andtheir performance is studied. The architectures include muti-layer perception(MLP), convolutional NN (CNN) and long short-term memory recurrent NN (LSTM). In our cross-lingual setup,only the source language (English, representing a well-resourcedlanguage) is used to train the AF extractors. AFs are thengenerated for the target language (Mandarin, representing anunder-resourced language) using the trained extractors. Theframe-classification accuracy indic...
The recent development of neural network-based automatic speech recognition (ASR) systems has greatl...
Multilingual speech recognition systems mostly benefit low resource languages but suffer degradation...
This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations ...
Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ w...
Automatic speech recognition requires many hours of transcribed speech recordings in order for an ac...
© 2014 IEEE. Speech signals are produced by the smooth and continuous movements of the human articul...
Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resour...
Articulatory features describe the way in which the speech organs are used when producing speech sou...
In recent years, the features derived from posteriors of a multilayer perceptron (MLP), known as tan...
<p>This paper describes the integration of language identification (LID) into a multilingual automat...
The use of articulatory features, such as place and manner of articulation, has been shown to reduce...
This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations ...
In the recent years, along with the development of artificial intelligence (AI) and man-machine inte...
The use of articulatory features, such as place and manner of articulation, has been shown to reduce...
Multilingual automatic speech recognition (ASR) systems mostly benefit low resource languages but su...
The recent development of neural network-based automatic speech recognition (ASR) systems has greatl...
Multilingual speech recognition systems mostly benefit low resource languages but suffer degradation...
This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations ...
Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ w...
Automatic speech recognition requires many hours of transcribed speech recordings in order for an ac...
© 2014 IEEE. Speech signals are produced by the smooth and continuous movements of the human articul...
Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resour...
Articulatory features describe the way in which the speech organs are used when producing speech sou...
In recent years, the features derived from posteriors of a multilayer perceptron (MLP), known as tan...
<p>This paper describes the integration of language identification (LID) into a multilingual automat...
The use of articulatory features, such as place and manner of articulation, has been shown to reduce...
This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations ...
In the recent years, along with the development of artificial intelligence (AI) and man-machine inte...
The use of articulatory features, such as place and manner of articulation, has been shown to reduce...
Multilingual automatic speech recognition (ASR) systems mostly benefit low resource languages but su...
The recent development of neural network-based automatic speech recognition (ASR) systems has greatl...
Multilingual speech recognition systems mostly benefit low resource languages but suffer degradation...
This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations ...