Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) task. The incorporation of visual features as additional contextual information as a means to improve ASR for this data has recently drawn attention from researchers. Our investigation extends existing ASR methods by using images and video titles to adapt a recurrent neural network (RNN) language model with a longshort term memory (LSTM) network. Our language model is tested on transcription of an existing corpus of instruction videos and on a new corpus consisting of lecture videos. Consistent reduction in perplexity by 5-10 is observed on both datasets. When the non-adapted model is combined with the image adaptation and video title a...
International audienceThis paper discusses the adaptation of vocabularies for automatic speech recog...
Current automatic speech recognition (ASR) systems are based on language models (LM) which gather wo...
International audienceWe aim at improving spoken language modeling (LM) using very large amount of a...
Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) t...
Videolectures are currently being digitised all over the world for its enormous value as reference r...
Recurrent neural network language models (RNNLMs) have consistently outperformed n-gram language mod...
International audienceThis papers aims at improving spoken language modeling (LM) using very large a...
This research addresses the language model (LM) domain mismatch problem in automatic speech recognit...
In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recogniti...
The goal of this thesis is to advance the use of recurrent neural network language models (RNNLMs) ...
International audienceThis paper discusses the adaptation of speech recognition vocabularies for aut...
Language models are a critical component of an automatic speech recognition (ASR) system. Neural net...
Modern automatic speech recognition (ASR) systems are speaker independent and designed to recognize ...
Recurrent neural network language models (RNNLMs) generally outperform n-gram language models when u...
International audienceIn several ASR use cases, training and adaptation of domain-specific LMs can o...
International audienceThis paper discusses the adaptation of vocabularies for automatic speech recog...
Current automatic speech recognition (ASR) systems are based on language models (LM) which gather wo...
International audienceWe aim at improving spoken language modeling (LM) using very large amount of a...
Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) t...
Videolectures are currently being digitised all over the world for its enormous value as reference r...
Recurrent neural network language models (RNNLMs) have consistently outperformed n-gram language mod...
International audienceThis papers aims at improving spoken language modeling (LM) using very large a...
This research addresses the language model (LM) domain mismatch problem in automatic speech recognit...
In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recogniti...
The goal of this thesis is to advance the use of recurrent neural network language models (RNNLMs) ...
International audienceThis paper discusses the adaptation of speech recognition vocabularies for aut...
Language models are a critical component of an automatic speech recognition (ASR) system. Neural net...
Modern automatic speech recognition (ASR) systems are speaker independent and designed to recognize ...
Recurrent neural network language models (RNNLMs) generally outperform n-gram language models when u...
International audienceIn several ASR use cases, training and adaptation of domain-specific LMs can o...
International audienceThis paper discusses the adaptation of vocabularies for automatic speech recog...
Current automatic speech recognition (ASR) systems are based on language models (LM) which gather wo...
International audienceWe aim at improving spoken language modeling (LM) using very large amount of a...