The written and spoken digits database is not a new database but a constructed database from existing ones, in order to provide a ready-to-use database for multimodal fusion.The written digits database is the original MNIST handwritten digits database [1] with no additional processing. It consists of 70000 images (60000 for training and 10000 for test) of 28 x 28 = 784 dimensions.The spoken digits database was extracted from Google Speech Commands [2], an audio dataset of spoken words that was proposed to train and evaluate keyword spotting systems. It consists of 105829 utterances of 35 words, amongst which 38908 utterances of the ten digits (34801 for training and 4107 for test). A pre-processing was done via the extraction of the Mel Fre...
This paper compares schemes for the selection of multi-genre broadcast data and corresponding transc...
Speech is the most effective means for humans to communicate their ideas and emotions across a varie...
This chapter reviews research into multimodal corpora with reference to their application in data-dr...
Database description: The written and spoken digits database is not a new database but a constructed...
International audienceIn this paper, we introduce a modified database for English spoken digits unde...
Graduation date: 1969This thesis is concerned with the design of a speech recognition\ud system to r...
The field of speech recognition has made human-machine voice interaction more convenient. Recognizin...
The dataset contains recordings for ten spoken digits in the Kadazan Language by 50 speakers. The te...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
This software demonstration overviews the developments made during the 3-year NCeSS funded Understan...
This paper describes a series of experiments that compare different approaches to training a speaker...
ABSTRACT This paper describes a set of experiments on training and search techniques for development...
This paper describes a series of experiments that compare different approaches to training a speaker...
N20EM dataset for multimodal lyric transcription, proposed in our ACM MM 2022 paper, MM-ALT: A Multi...
The Multiphonia Corpus consists of audio-video classroom recordings comparing two methods of phoneti...
This paper compares schemes for the selection of multi-genre broadcast data and corresponding transc...
Speech is the most effective means for humans to communicate their ideas and emotions across a varie...
This chapter reviews research into multimodal corpora with reference to their application in data-dr...
Database description: The written and spoken digits database is not a new database but a constructed...
International audienceIn this paper, we introduce a modified database for English spoken digits unde...
Graduation date: 1969This thesis is concerned with the design of a speech recognition\ud system to r...
The field of speech recognition has made human-machine voice interaction more convenient. Recognizin...
The dataset contains recordings for ten spoken digits in the Kadazan Language by 50 speakers. The te...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
This software demonstration overviews the developments made during the 3-year NCeSS funded Understan...
This paper describes a series of experiments that compare different approaches to training a speaker...
ABSTRACT This paper describes a set of experiments on training and search techniques for development...
This paper describes a series of experiments that compare different approaches to training a speaker...
N20EM dataset for multimodal lyric transcription, proposed in our ACM MM 2022 paper, MM-ALT: A Multi...
The Multiphonia Corpus consists of audio-video classroom recordings comparing two methods of phoneti...
This paper compares schemes for the selection of multi-genre broadcast data and corresponding transc...
Speech is the most effective means for humans to communicate their ideas and emotions across a varie...
This chapter reviews research into multimodal corpora with reference to their application in data-dr...