The SWITCHBOARD (SWB) Corpus consists of 2430 conversations digitally recorded over long distance telephone lines. The SWB Corpus totals over 240 conversation hours (elapsed time) of data. The average conversation duration is six minutes. The transcriptions contain more than 3 million words of text. The SWB Corpus includes more than 500 adult-aged speakers and covers most major American English dialects. Such impressive statistics make SWB the premier database for telephone bandwidth large vocabulary conversational speech recognition (LVCSR) research. The goal of this project is to resegment the speech data and correct the transcriptions in an effort to significantly advance LVCSR technology. We have completed the first six months of the SW...
International audienceThe reproducibility of scientific studies grounded on language corpora require...
The DoubleTalk articulatory corpus was collected at the Edinburgh Speech Production Facility (ESPF) ...
This article describes a methodology for collecting text from the Web to match a target sublanguage ...
SWITCHBOARD (SWB) Corpus consists of 2438 conversations digitally recorded over long distance teleph...
In this paper we report recent developments on the meet-ing transcription task, a large vocabulary c...
We present a conversational telephone speech data set designed to support research on novel acoustic...
Models of speech recognition (by both human and machine) have traditionally assumed the phoneme to s...
INTRODUCTION Pronunciations in spontaneous, conversational speech tend to be much more variable tha...
This paper presents the 1997 BBN Byblos Large Vo-cabulary Speech Recognition (LVCSR) system. We give...
Recognition of conversational speech is one of the most challenging speech recognition tasks to-date...
International audienceReplicability of scientific studies grounded on language corpora requires a ca...
The Linguistic Data Consortium’s Human Subjects Data Collection lab conducts cross-channel speech co...
Training language model made from conversational speech is difficult due to large variation of the w...
In this paper we present a set of techniques we employed in our Janus Recognition Toolkit (JRTk) Swi...
Statistical language modelling may not only be used to uncover the patterns which underlie the compo...
International audienceThe reproducibility of scientific studies grounded on language corpora require...
The DoubleTalk articulatory corpus was collected at the Edinburgh Speech Production Facility (ESPF) ...
This article describes a methodology for collecting text from the Web to match a target sublanguage ...
SWITCHBOARD (SWB) Corpus consists of 2438 conversations digitally recorded over long distance teleph...
In this paper we report recent developments on the meet-ing transcription task, a large vocabulary c...
We present a conversational telephone speech data set designed to support research on novel acoustic...
Models of speech recognition (by both human and machine) have traditionally assumed the phoneme to s...
INTRODUCTION Pronunciations in spontaneous, conversational speech tend to be much more variable tha...
This paper presents the 1997 BBN Byblos Large Vo-cabulary Speech Recognition (LVCSR) system. We give...
Recognition of conversational speech is one of the most challenging speech recognition tasks to-date...
International audienceReplicability of scientific studies grounded on language corpora requires a ca...
The Linguistic Data Consortium’s Human Subjects Data Collection lab conducts cross-channel speech co...
Training language model made from conversational speech is difficult due to large variation of the w...
In this paper we present a set of techniques we employed in our Janus Recognition Toolkit (JRTk) Swi...
Statistical language modelling may not only be used to uncover the patterns which underlie the compo...
International audienceThe reproducibility of scientific studies grounded on language corpora require...
The DoubleTalk articulatory corpus was collected at the Edinburgh Speech Production Facility (ESPF) ...
This article describes a methodology for collecting text from the Web to match a target sublanguage ...