We develop and evaluate models for automatic vision-based voice activity detection (VAD) in multiparty human-human interactions that are aimed at complementing acoustic VAD methods. We provide evidence that this type of vision-based VAD models are susceptible to spatial bias in the dataset used for their development; the physical settings of the interaction, usually constant throughout data acquisition, determines the distribution of head poses of the participants. Our results show that when the head pose distributions are significantly different in the train and test sets, the performance of the vision-based VAD models drops significantly. This suggests that previously reported results on datasets with a fixed physical configuration may ov...
International audienceVisual voice activity detection (V-VAD) uses visual features to predict whethe...
This paper presents a self-supervised method for visual detection of the active speaker in a multi-p...
Current voice activity detection methods generally utilise only acoustic information. Therefore they...
Visual activity detection of lip movements can be used to overcome the poor performance of voice act...
International audienceVisual voice activity detection (V-VAD) uses visual features to predict whethe...
An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is p...
Human can extract speech signals that they need to understand from a mixture of background noise, in...
In this paper we present two novel methods for visual voice activity detection (V-VAD) which exploit...
In this paper we present two novel methods for visual voice activity detection (V-VAD) which exploit...
This work has been funded by the EU H2020 project #871245 SPRING and by the Multidisciplinary Instit...
RealVAD: A Real-world Dataset for Voice Activity Detection The task of automatically detecting “Who...
This paper presents a self-supervised method for visual detection of the active speaker in a multi-p...
This paper presents a self-supervised method for visual detection of the active speaker in a multi-p...
This paper presents a self-supervised method for visual detection of the active speaker in a multi-p...
In this paper we present two novel methods for visual voice activity detection (V-VAD) which exploit...
International audienceVisual voice activity detection (V-VAD) uses visual features to predict whethe...
This paper presents a self-supervised method for visual detection of the active speaker in a multi-p...
Current voice activity detection methods generally utilise only acoustic information. Therefore they...
Visual activity detection of lip movements can be used to overcome the poor performance of voice act...
International audienceVisual voice activity detection (V-VAD) uses visual features to predict whethe...
An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is p...
Human can extract speech signals that they need to understand from a mixture of background noise, in...
In this paper we present two novel methods for visual voice activity detection (V-VAD) which exploit...
In this paper we present two novel methods for visual voice activity detection (V-VAD) which exploit...
This work has been funded by the EU H2020 project #871245 SPRING and by the Multidisciplinary Instit...
RealVAD: A Real-world Dataset for Voice Activity Detection The task of automatically detecting “Who...
This paper presents a self-supervised method for visual detection of the active speaker in a multi-p...
This paper presents a self-supervised method for visual detection of the active speaker in a multi-p...
This paper presents a self-supervised method for visual detection of the active speaker in a multi-p...
In this paper we present two novel methods for visual voice activity detection (V-VAD) which exploit...
International audienceVisual voice activity detection (V-VAD) uses visual features to predict whethe...
This paper presents a self-supervised method for visual detection of the active speaker in a multi-p...
Current voice activity detection methods generally utilise only acoustic information. Therefore they...