Automatic closed-captioning of video is a useful application of speech recognition technology but poses numerous challenges when applied to open-domain user-uploaded videos such as those on YouTube. In this work, we explore a strategy to improve decoding accuracy for video transcription by decoding each video with a language model (LM) adapted specifically to the topics that the video covers. Taxonomic topic classifiers are used to determine the topic content of videos and to build a large set of topic-specific LMs from web documents. We consider strategies for selecting and interpolating LMs in both supervised and unsupervised scenarios in a two-pass lattice rescoring framework. Experiments on a YouTube video corpus show a 3.6 absolute red...
Building a stochastic language model (LM) for speech recog-nition requires a large corpus of target ...
International audienceWhereas topic-based adaptation of language models (LM) claims to increase the ...
To perform real-word information processing, such as intelligent robotics, multimodal dialogue syste...
International audienceThis paper discusses the adaptation of speech recognition vocabularies for aut...
Language models used in current automatic speech recognition systems are trained on general-purpose ...
Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) t...
This paper integrates techniques in natural language processing and computer vision to improve recog...
In this work, we present a method for automatic topic classification of educational videos using a s...
In this paper, an approach for unsupervised dynamic adaptation of the language model used in an auto...
17 pages, 2 columnsInternational audienceThe gradual migration of television from broadcast diffusio...
Dense video captioning is a task of localizing interesting events from an untrimmed video and produc...
To perform real-word information processing, such as intelligent robotics, multimodal dialogue syste...
Recent progress in using Long Short-Term Memory (LSTM) for image description has motivated the explo...
Accepted at ICCV 2019International audienceLearning text-video embeddings usually requires a dataset...
Videolectures are currently being digitised all over the world for its enormous value as reference r...
Building a stochastic language model (LM) for speech recog-nition requires a large corpus of target ...
International audienceWhereas topic-based adaptation of language models (LM) claims to increase the ...
To perform real-word information processing, such as intelligent robotics, multimodal dialogue syste...
International audienceThis paper discusses the adaptation of speech recognition vocabularies for aut...
Language models used in current automatic speech recognition systems are trained on general-purpose ...
Transcription of multimedia data sources is often a challenging automatic speech recognition (ASR) t...
This paper integrates techniques in natural language processing and computer vision to improve recog...
In this work, we present a method for automatic topic classification of educational videos using a s...
In this paper, an approach for unsupervised dynamic adaptation of the language model used in an auto...
17 pages, 2 columnsInternational audienceThe gradual migration of television from broadcast diffusio...
Dense video captioning is a task of localizing interesting events from an untrimmed video and produc...
To perform real-word information processing, such as intelligent robotics, multimodal dialogue syste...
Recent progress in using Long Short-Term Memory (LSTM) for image description has motivated the explo...
Accepted at ICCV 2019International audienceLearning text-video embeddings usually requires a dataset...
Videolectures are currently being digitised all over the world for its enormous value as reference r...
Building a stochastic language model (LM) for speech recog-nition requires a large corpus of target ...
International audienceWhereas topic-based adaptation of language models (LM) claims to increase the ...
To perform real-word information processing, such as intelligent robotics, multimodal dialogue syste...