This paper presents a novel metric learning approach to address the performance gap between normal and silent speech in visual speech recognition (VSR). The difference in lip movements between the two poses a challenge for existing VSR models, which exhibit degraded accuracy when applied to silent speech. To solve this issue and tackle the scarcity of training data for silent speech, we propose to leverage the shared literal content between normal and silent speech and present a metric learning approach based on visemes. Specifically, we aim to map the input of two speech types close to each other in a latent space if they have similar viseme representations. By minimizing the Kullback-Leibler divergence of the predicted viseme probability ...
In machine lip-reading, which is identification of speech from visual-only information, there is evi...
In this paper we propose a new learning-based representation that is referred to as Visual Speech Un...
Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of co...
Visual speech recognition (VSR) aims to recognize the content of speech based on lip movements, with...
The goal of this paper is to learn strong lip reading models that can recognise speech in silent vid...
Comunicació presentada a: FG 2017 12th IEEE International Conference on Automatic Face and Gesture R...
In this work, we propose a technique to transfer speech recognition capabilities from audio speech r...
Visual lip gestures observed whilst lipreading have a few working definitions, the most common two a...
Comunicació presentada a: FG 2017 12th IEEE International Conference on Automatic Face and Gesture R...
To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work ofte...
Silent speech interfaces (SSIs), which recognize speech from articulatory information (i.e., without...
Lip reading, the ability to recognize text information from the movement of a speaker's mouth, is a ...
Lipreading is understanding speech from observed lip movements. An observed series of lip motions is...
In machine lip-reading there is continued debate and research around the correct classes to be used ...
International audienceThis article investigates the use of statistical mapping techniques for the co...
In machine lip-reading, which is identification of speech from visual-only information, there is evi...
In this paper we propose a new learning-based representation that is referred to as Visual Speech Un...
Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of co...
Visual speech recognition (VSR) aims to recognize the content of speech based on lip movements, with...
The goal of this paper is to learn strong lip reading models that can recognise speech in silent vid...
Comunicació presentada a: FG 2017 12th IEEE International Conference on Automatic Face and Gesture R...
In this work, we propose a technique to transfer speech recognition capabilities from audio speech r...
Visual lip gestures observed whilst lipreading have a few working definitions, the most common two a...
Comunicació presentada a: FG 2017 12th IEEE International Conference on Automatic Face and Gesture R...
To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work ofte...
Silent speech interfaces (SSIs), which recognize speech from articulatory information (i.e., without...
Lip reading, the ability to recognize text information from the movement of a speaker's mouth, is a ...
Lipreading is understanding speech from observed lip movements. An observed series of lip motions is...
In machine lip-reading there is continued debate and research around the correct classes to be used ...
International audienceThis article investigates the use of statistical mapping techniques for the co...
In machine lip-reading, which is identification of speech from visual-only information, there is evi...
In this paper we propose a new learning-based representation that is referred to as Visual Speech Un...
Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of co...