This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram exhibits symmetric tree-like structures whose stems are located on the delay that corresponds to multiple pitch periods. These pitch-related structures are exploited in the study to group spectral components at each time frame. Local pitch estimates are then computed for each spectral group and are used to form simultaneous pitch tracks for temporal integration. These proc...
Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.Includes bibliographic...
Communication by speech is intrinsic for humans. Since the breakthrough of mobile devices and wirele...
This paper reports the preliminary results of experiments on listening to several sounds at once. ‘I...
Conventional speech recognition is notoriously vulnerable to additive noise, and even the best compe...
Automatic Speech Recognition (ASR) engines are extremely susceptible to noise. There is an increasin...
Speaker models for blind source separation are typically based on HMMs consisting of vast numbers of...
Significant strides have been made in the field of automatic speech recognition over the past three ...
Automatic segregation of overlapping speech signals from single-channel recordings is a challenging ...
Analyzing sound mixtures into individual waveforms proves very difficult, except in constrained circ...
International audienceLooking at the speaker's face is useful to hear better a speech signal and ext...
An overview of work on recognizing speech in mixtures using missing data techniques and searching ac...
Speech separation by machines has been extensively studied for many decades and several algorithms a...
SPIE Defense, Security, and SensingInternational audienceAudio source separation aims to extract the...
The ‘cocktail party problem’ is the task of attending to a source of interest, usually speech, in a ...
While humans can easily segregate and track a speaker's voice in a loud noisy environment, most mode...
Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.Includes bibliographic...
Communication by speech is intrinsic for humans. Since the breakthrough of mobile devices and wirele...
This paper reports the preliminary results of experiments on listening to several sounds at once. ‘I...
Conventional speech recognition is notoriously vulnerable to additive noise, and even the best compe...
Automatic Speech Recognition (ASR) engines are extremely susceptible to noise. There is an increasin...
Speaker models for blind source separation are typically based on HMMs consisting of vast numbers of...
Significant strides have been made in the field of automatic speech recognition over the past three ...
Automatic segregation of overlapping speech signals from single-channel recordings is a challenging ...
Analyzing sound mixtures into individual waveforms proves very difficult, except in constrained circ...
International audienceLooking at the speaker's face is useful to hear better a speech signal and ext...
An overview of work on recognizing speech in mixtures using missing data techniques and searching ac...
Speech separation by machines has been extensively studied for many decades and several algorithms a...
SPIE Defense, Security, and SensingInternational audienceAudio source separation aims to extract the...
The ‘cocktail party problem’ is the task of attending to a source of interest, usually speech, in a ...
While humans can easily segregate and track a speaker's voice in a loud noisy environment, most mode...
Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.Includes bibliographic...
Communication by speech is intrinsic for humans. Since the breakthrough of mobile devices and wirele...
This paper reports the preliminary results of experiments on listening to several sounds at once. ‘I...