Submitted to ICASSP 2020We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information , speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from the speaker location which is then used to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second order statistics and to derive an adaptive beamformer in the third stage. We generated a multichannel, multispeaker, reverberated, noisy dataset inspired from the well studied WSJ0-2mix and study the performance ...
Master's thesis in Computer scienceThe cocktail party problem, also known as a single-channel multi-...
Speech recognition in multi-channel environments requires target speaker localization, multi-channel...
Human auditory system uses masking as one of the primary mechanisms for robust perception of speech ...
When speech is captured with a distant microphone, it includes distortions caused by noise, reverber...
Abstract. Interest within the automatic speech recognition research community has recently focused o...
Voice based personal assistants are part of our daily lives. Their performance suffers in the presen...
International audienceSpeaker localization is a hard task, especially in adverse environmental condi...
International audienceSpeaker localization is a hard task, especially in adverse environmental condi...
The robust localization of speech sources is required for a wide range of applications, among them h...
The robust localization of speech sources is required for a wide range of applications, among them h...
The robust localization of speech sources is required for a wide range of applications, among them h...
The robust localization of speech sources is required for a wide range of applications, among them h...
The robust localization of speech sources is required for a wide range of applications, among them h...
The robust localization of speech sources is required for a wide range of applications, among them h...
Speech localisation in multitalker mixtures is affected by the listener’s expectations about the spa...
Master's thesis in Computer scienceThe cocktail party problem, also known as a single-channel multi-...
Speech recognition in multi-channel environments requires target speaker localization, multi-channel...
Human auditory system uses masking as one of the primary mechanisms for robust perception of speech ...
When speech is captured with a distant microphone, it includes distortions caused by noise, reverber...
Abstract. Interest within the automatic speech recognition research community has recently focused o...
Voice based personal assistants are part of our daily lives. Their performance suffers in the presen...
International audienceSpeaker localization is a hard task, especially in adverse environmental condi...
International audienceSpeaker localization is a hard task, especially in adverse environmental condi...
The robust localization of speech sources is required for a wide range of applications, among them h...
The robust localization of speech sources is required for a wide range of applications, among them h...
The robust localization of speech sources is required for a wide range of applications, among them h...
The robust localization of speech sources is required for a wide range of applications, among them h...
The robust localization of speech sources is required for a wide range of applications, among them h...
The robust localization of speech sources is required for a wide range of applications, among them h...
Speech localisation in multitalker mixtures is affected by the listener’s expectations about the spa...
Master's thesis in Computer scienceThe cocktail party problem, also known as a single-channel multi-...
Speech recognition in multi-channel environments requires target speaker localization, multi-channel...
Human auditory system uses masking as one of the primary mechanisms for robust perception of speech ...