The Grid Corpus is a large multitalker audiovisual sentence corpus designed to support joint computational-behavioral studies in speech perception. In brief, the corpus consists of high-quality audio and video (facial) recordings of 1000 sentences spoken by each of 34 talkers (18 male, 16 female), for a total of 34000 sentences. Sentences are of the form "put red at G9 now". audio_25k.zip contains the wav format utterances at a 25 kHz sampling rate in a separate directory per talker alignments.zip provides word-level time alignments, again separated by talker s1.zip, s2.zip etc contain .jpg videos for each talker [note that due to an oversight, no video for talker t21 is available] The Grid Corpus is described in detail in the paper jasa...
Communication between humans deeply relies on our capability of experiencing, expressing, and recogn...
International audienceThis paper presents an augmentation of MSCOCO dataset where speech is added to...
The Places Audio Caption (Japanese) 100K Corpus contains approximately 100,000 Japanese spoken capti...
Lombard Grid is a bi-view audiovisual Lombard speech corpus which can be used to support joint compu...
Seeing a speaker's face can help substantially in understanding them, in particular in challenging l...
Immersive stereoscopic footage of a Coordinate Response Measure (CRM) recorded from two actors. The ...
We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support...
Audiovisual speech recognition (AVSR) systems have been proven superior over audio-only speech recog...
This dataset contains audio stimuli (wav format, 16 kHz) used to test the perception of speech that ...
What is ASPIRE? ASPIRE is a a first of its kind, audiovisual speech corpus recorded in real noisy e...
Dataset accompanying the paper "Objective speech outcomes after surgical treatment for oral cancer: ...
This dataset contains the stimuli heard by participants in a study of sculpted speech. The zip file ...
This is a modified version of the speech audio contained within the Ryerson Audio-Visual Database of...
This archive contains the video features in Kaldi's [1] ark format that correspond to the CHiME-2 Tr...
The Sharvard Corpus is both a list of phonemically-balanced Spanish sentences and recordings of the ...
Communication between humans deeply relies on our capability of experiencing, expressing, and recogn...
International audienceThis paper presents an augmentation of MSCOCO dataset where speech is added to...
The Places Audio Caption (Japanese) 100K Corpus contains approximately 100,000 Japanese spoken capti...
Lombard Grid is a bi-view audiovisual Lombard speech corpus which can be used to support joint compu...
Seeing a speaker's face can help substantially in understanding them, in particular in challenging l...
Immersive stereoscopic footage of a Coordinate Response Measure (CRM) recorded from two actors. The ...
We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support...
Audiovisual speech recognition (AVSR) systems have been proven superior over audio-only speech recog...
This dataset contains audio stimuli (wav format, 16 kHz) used to test the perception of speech that ...
What is ASPIRE? ASPIRE is a a first of its kind, audiovisual speech corpus recorded in real noisy e...
Dataset accompanying the paper "Objective speech outcomes after surgical treatment for oral cancer: ...
This dataset contains the stimuli heard by participants in a study of sculpted speech. The zip file ...
This is a modified version of the speech audio contained within the Ryerson Audio-Visual Database of...
This archive contains the video features in Kaldi's [1] ark format that correspond to the CHiME-2 Tr...
The Sharvard Corpus is both a list of phonemically-balanced Spanish sentences and recordings of the ...
Communication between humans deeply relies on our capability of experiencing, expressing, and recogn...
International audienceThis paper presents an augmentation of MSCOCO dataset where speech is added to...
The Places Audio Caption (Japanese) 100K Corpus contains approximately 100,000 Japanese spoken capti...