Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) task. The incorporation of visual features as additional contextual information as a means to improve ASR for this data has recently drawn attention from researchers. Our investigation extends existing ASR methods by using images and video titles to adapt a recurrent neural network (RNN) language model with a longshort term memory (LSTM) network. Our language model is tested on transcription of an existing corpus of instruction videos and on a new corpus consisting of lecture videos. Consistent reduction in perplexity by 5-10 is observed on both datasets. When the non-adapted model is combined with the image adaptation and video title adaptati...
Automatic speech recognition (ASR) incorporates knowledge and research in linguistics, computer scie...
Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area ...
International audienceWe aim at improving spoken language modeling (LM) using very large amount of a...
Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) t...
Automatic speech recognition (ASR) permits effective interaction between humans and machines in envi...
We propose an end-to-end deep learning architecture for word level visual speech recognition. The sy...
The goal of this thesis is to advance the use of recurrent neural network language models (RNNLMs) ...
Videolectures are currently being digitised all over the world for its enormous value as reference r...
Long Short-Term Memory (LSTM) is a recurrent neural net-work (RNN) architecture specializing in mode...
Language models are a critical component of an automatic speech recognition (ASR) system. Neural net...
International audienceThis papers aims at improving spoken language modeling (LM) using very large a...
Recurrent neural network language models (RNNLMs) have been shown to consistently improve Word Error...
Recurrent neural network language models (RNNLMs) have re-cently become increasingly popular for man...
Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area ...
Building a stochastic language model (LM) for speech recog-nition requires a large corpus of target ...
Automatic speech recognition (ASR) incorporates knowledge and research in linguistics, computer scie...
Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area ...
International audienceWe aim at improving spoken language modeling (LM) using very large amount of a...
Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) t...
Automatic speech recognition (ASR) permits effective interaction between humans and machines in envi...
We propose an end-to-end deep learning architecture for word level visual speech recognition. The sy...
The goal of this thesis is to advance the use of recurrent neural network language models (RNNLMs) ...
Videolectures are currently being digitised all over the world for its enormous value as reference r...
Long Short-Term Memory (LSTM) is a recurrent neural net-work (RNN) architecture specializing in mode...
Language models are a critical component of an automatic speech recognition (ASR) system. Neural net...
International audienceThis papers aims at improving spoken language modeling (LM) using very large a...
Recurrent neural network language models (RNNLMs) have been shown to consistently improve Word Error...
Recurrent neural network language models (RNNLMs) have re-cently become increasingly popular for man...
Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area ...
Building a stochastic language model (LM) for speech recog-nition requires a large corpus of target ...
Automatic speech recognition (ASR) incorporates knowledge and research in linguistics, computer scie...
Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area ...
International audienceWe aim at improving spoken language modeling (LM) using very large amount of a...