We present an audiovisual speech corpus that is designed for cognitive neuroscience studies and that can also be employed for research on audiovisual speech recognition. The corpus consists of 3.6 hours of audiovisual recordings of two speakers, one male and one female, reading passages from a narrative English text. The visual recordings were acquired at a high frame rate of 119.88 frames per second (fps) and exported at a high resolution of 528×718 pixels. The speech is pronounced with a neutral British accent and is directed at the camera. Both speakers read the same 59 passages of a book, for a total of 1h50' each. The passage scripts, largely contiguous within a non-fiction source book chosen for its compelling content, were selected a...
THESIS 11265Seeing a speaker?s face as he or she talks can greatly help in understanding what the sp...
The multimodal character of speech processing has attracted research endeavors that range from engin...
Face-to-face communication involves both hearing and seeing speech. Heard and seen speech inputs int...
Seeing a speaker's face can help substantially in understanding them, in particular in challenging l...
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listenin...
Immersive stereoscopic footage of a Coordinate Response Measure (CRM) recorded from two actors. The ...
The Grid Corpus is a large multitalker audiovisual sentence corpus designed to support joint computa...
In this paper we discuss the design, acquisition and preprocessing of a Czech audio-visual speech co...
Perceptual processes mediating recognition, including the recognition of objects and spoken words, i...
fMRI was used to assess the relationship between brain activation and the degree of audiovisual inte...
In this paper we discuss the design, acquisition and preprocessing of a Czech audio-visual speech co...
Audiovisual speech recognition (AVSR) systems have been proven superior over audio-only speech recog...
Speech production and perception are two of the most complex actions humans perform. The processing ...
The many aspects of audiovisual (AV) speech processing have attracted a loyal following from a wide ...
Human speech processing (perception and in some cases production) is approached from three levels. A...
THESIS 11265Seeing a speaker?s face as he or she talks can greatly help in understanding what the sp...
The multimodal character of speech processing has attracted research endeavors that range from engin...
Face-to-face communication involves both hearing and seeing speech. Heard and seen speech inputs int...
Seeing a speaker's face can help substantially in understanding them, in particular in challenging l...
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listenin...
Immersive stereoscopic footage of a Coordinate Response Measure (CRM) recorded from two actors. The ...
The Grid Corpus is a large multitalker audiovisual sentence corpus designed to support joint computa...
In this paper we discuss the design, acquisition and preprocessing of a Czech audio-visual speech co...
Perceptual processes mediating recognition, including the recognition of objects and spoken words, i...
fMRI was used to assess the relationship between brain activation and the degree of audiovisual inte...
In this paper we discuss the design, acquisition and preprocessing of a Czech audio-visual speech co...
Audiovisual speech recognition (AVSR) systems have been proven superior over audio-only speech recog...
Speech production and perception are two of the most complex actions humans perform. The processing ...
The many aspects of audiovisual (AV) speech processing have attracted a loyal following from a wide ...
Human speech processing (perception and in some cases production) is approached from three levels. A...
THESIS 11265Seeing a speaker?s face as he or she talks can greatly help in understanding what the sp...
The multimodal character of speech processing has attracted research endeavors that range from engin...
Face-to-face communication involves both hearing and seeing speech. Heard and seen speech inputs int...