The first part of this thesis focuses on very low-dimensional bottleneck features (BNFs), extracted from deep neural networks (DNNs) for speech analysis and recognition. Very low-dimensional BNFs are analysed in terms of their capability of representing speech and their suitability for modelling speech dynamics. Nine-dimensional BNFs obtained from a phone discrimination DNN are shown to give comparable phone recognition accuracy to 39-dimensional MFCCs, and an average of 34% higher phone recognition accuracy than formant-based features of the same dimensions. They also preserve the trajectory continuity well and thus hold promise for modelling speech dynamics. Visualisations and interpretations of the BNFs are presented, with phonetically m...
The recent speaker embeddings framework has been shown to provide excellent performance on the task ...
In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to...
Automatic speech recognition (ASR) is a key core technology for the information age. ASR systems hav...
Effective representation plays an important role in automatic spoken language identification (LID). ...
Deep learning and neural network research has grown significantly in the fields of automatic speech ...
Recently, automatic speech recognition (ASR) systems that use deep neural networks (DNNs) for acoust...
Language recognition systems based on bottleneck features have recently become the state-of-the-art ...
This thesis makes three main contributions to the area of speech recognition with Deep Neural Networ...
Language recognition systems based on bottleneck features have recently become the state-of-the-art ...
We have recently proposed a new acoustic model based on prob-abilistic linear discriminant analysis ...
International audienceAutomatic speech recognition is complementary to language recognition. The lan...
A defining problem in spoken language identification (LID) is how to design effective representation...
Over these last few years, the use of Artificial Neural Networks (ANNs), now often referred to as de...
Representation learning is a fundamental ingredient of deep learning. However, learning a good repre...
The development of a speech recognition system requires at least three resources: a large labeled sp...
The recent speaker embeddings framework has been shown to provide excellent performance on the task ...
In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to...
Automatic speech recognition (ASR) is a key core technology for the information age. ASR systems hav...
Effective representation plays an important role in automatic spoken language identification (LID). ...
Deep learning and neural network research has grown significantly in the fields of automatic speech ...
Recently, automatic speech recognition (ASR) systems that use deep neural networks (DNNs) for acoust...
Language recognition systems based on bottleneck features have recently become the state-of-the-art ...
This thesis makes three main contributions to the area of speech recognition with Deep Neural Networ...
Language recognition systems based on bottleneck features have recently become the state-of-the-art ...
We have recently proposed a new acoustic model based on prob-abilistic linear discriminant analysis ...
International audienceAutomatic speech recognition is complementary to language recognition. The lan...
A defining problem in spoken language identification (LID) is how to design effective representation...
Over these last few years, the use of Artificial Neural Networks (ANNs), now often referred to as de...
Representation learning is a fundamental ingredient of deep learning. However, learning a good repre...
The development of a speech recognition system requires at least three resources: a large labeled sp...
The recent speaker embeddings framework has been shown to provide excellent performance on the task ...
In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to...
Automatic speech recognition (ASR) is a key core technology for the information age. ASR systems hav...