Communication using speech is often an audio-visual experience. Listeners hear what is being uttered by speakers and also see the corresponding facial movements and other gestures. This thesis is an attempt to exploit this bimodal (audio-visual) nature of speech for speaker separation. In addition to the audio speech features, visual speech features are used to achieve the task of speaker separation. An analysis of the correlation between audio and visual speech features is carried out first. This correlation between audio and visual features is then used in the estimation of clean audio features from visual features using Gaussian MixtureModels (GMMs) andMaximum a Posteriori (MAP) estimation. For speaker separation three methods ar...
Humans with normal hearing ability are generally skilful in listening selectively to a particular sp...
Humans are skilled in selectively extracting a single sound source in the presence of multiple simul...
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-talker spee...
Communication using speech is often an audio-visual experience. Listeners hear what is being uttere...
This work examines whether visual speech infor- mation can be effective within audio masking-based s...
This paper examines the utility of audio-visual speech for the two related tasks of speech and speak...
Speech separation is the task of segregating a target speech signal from background interference. To...
The aim of the work conducted in this thesis is to reconstruct audio speech signals using informatio...
In this paper we present an overview of recent research in the area of audio-visual blind source sep...
International audienceLooking at the speaker's face is useful to hear better a speech signal and ext...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...
International audienceThe separation of speech signals measured at multiple microphones in noisy and...
Recent studies show that facial information contained in visual speech can be helpful for the perfor...
The aim of the work in this thesis is to explore how visual speech can be used within monaural maski...
The cocktail party problem is one of following a conversation in a crowded room where there are many...
Humans with normal hearing ability are generally skilful in listening selectively to a particular sp...
Humans are skilled in selectively extracting a single sound source in the presence of multiple simul...
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-talker spee...
Communication using speech is often an audio-visual experience. Listeners hear what is being uttere...
This work examines whether visual speech infor- mation can be effective within audio masking-based s...
This paper examines the utility of audio-visual speech for the two related tasks of speech and speak...
Speech separation is the task of segregating a target speech signal from background interference. To...
The aim of the work conducted in this thesis is to reconstruct audio speech signals using informatio...
In this paper we present an overview of recent research in the area of audio-visual blind source sep...
International audienceLooking at the speaker's face is useful to hear better a speech signal and ext...
Speechreading increases intelligibility in human speech perception. This suggests that conventional ...
International audienceThe separation of speech signals measured at multiple microphones in noisy and...
Recent studies show that facial information contained in visual speech can be helpful for the perfor...
The aim of the work in this thesis is to explore how visual speech can be used within monaural maski...
The cocktail party problem is one of following a conversation in a crowded room where there are many...
Humans with normal hearing ability are generally skilful in listening selectively to a particular sp...
Humans are skilled in selectively extracting a single sound source in the presence of multiple simul...
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-talker spee...