Visual speech recognition (VSR) aims to recognize the content of speech based on lip movements, without relying on the audio stream. Advances in deep learning and the availability of large audio-visual datasets have led to the development of much more accurate and robust VSR models than ever before. However, these advances are usually due to the larger training sets rather than the model design. Here we demonstrate that designing better models is equally as important as using larger training sets. We propose the addition of prediction-based auxiliary tasks to a VSR model, and highlight the importance of hyperparameter optimization and appropriate data augmentations. We show that such a model works for different languages and outperforms all...
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer S...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
This is the repository of Visual Speech Recognition for Multiple Languages, which is the successor o...
Speech is the most natural means of communication for humans. Therefore, since the beginning of comp...
This paper investigates multimodal sensor architectures with deep learning for audio-visual speech r...
Visual speech, referring to the visual domain of speech, has attracted increasing attention due to i...
In this paper we propose a new learning-based representation that is referred to as Visual Speech Un...
This paper presents a novel metric learning approach to address the performance gap between normal a...
Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the...
We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DB...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
International audienceStandard Visual Speech Recognition (VSR) systems directly process images as in...
In visual speech recognition (VSR), speech is transcribed using only visual information to interpret...
This thesis describes how an automatic lip reader was realized. Visual speech recognition is a preco...
This dissertation presents a new learning-based representation that is referred to as a Visual Spee...
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer S...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
This is the repository of Visual Speech Recognition for Multiple Languages, which is the successor o...
Speech is the most natural means of communication for humans. Therefore, since the beginning of comp...
This paper investigates multimodal sensor architectures with deep learning for audio-visual speech r...
Visual speech, referring to the visual domain of speech, has attracted increasing attention due to i...
In this paper we propose a new learning-based representation that is referred to as Visual Speech Un...
This paper presents a novel metric learning approach to address the performance gap between normal a...
Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the...
We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DB...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
International audienceStandard Visual Speech Recognition (VSR) systems directly process images as in...
In visual speech recognition (VSR), speech is transcribed using only visual information to interpret...
This thesis describes how an automatic lip reader was realized. Visual speech recognition is a preco...
This dissertation presents a new learning-based representation that is referred to as a Visual Spee...
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer S...
Abstract — Visual speech information from the speaker’s mouth region has been successfully shown to ...
This is the repository of Visual Speech Recognition for Multiple Languages, which is the successor o...