Syntactic parsing is a fundamental natural language process-ing technology that has proven useful in machine translation, language modeling, sentence segmentation, and a number of other applications related to speech translation. However, there is a paucity of manually annotated syntactic parsing resources for speech, and particularly for the lecture speech that is the current target of the IWSLT translation campaign. In this work, we present a new manually annotated treebank of TED talks that we hope will prove useful for investiga-tion into the interaction between syntax and these speech-related applications. The first version of the corpus includes 1,217 sentences and 23,158 words manually annotated with parse trees, and aligned with tra...
International audienceThis paper describes ODIL Syntax, a French treebank built on spontaneous speec...
This paper presents the corpus developed by the LIUM for Automatic Speech Recognition (ASR), based o...
This document summarizes the major work on discourse processing in the Enthusiast Spanish-English sp...
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work...
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work...
We introduce TED-Multilingual Discourse Bank, a corpus of TED talks transcripts in 6 languages (Engl...
We report on the success of a two-pass approach to annotating metadata, speech effects and syntactic...
End-to-end spoken language translation (SLT) has recently gained popularity thanks to the advancemen...
We present an English–Korean speech translation corpus, named EnKoST-C. End-to-end model training fo...
In this paper, we present improvements made to the TED-LIUM corpus we released in 2012. These enhanc...
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech transl...
Creating syntactic trees for spoken language data involves many challenges. The problems pertain to ...
AbstractThis paper describes the TED Corpus Search Engine (TCSE), an online corpus system that searc...
International audienceThis paper describes a syntactic annotation platform (Contemplata) that integr...
While we have seen significant advances in automatic summarization for text, research on speech summ...
International audienceThis paper describes ODIL Syntax, a French treebank built on spontaneous speec...
This paper presents the corpus developed by the LIUM for Automatic Speech Recognition (ASR), based o...
This document summarizes the major work on discourse processing in the Enthusiast Spanish-English sp...
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work...
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work...
We introduce TED-Multilingual Discourse Bank, a corpus of TED talks transcripts in 6 languages (Engl...
We report on the success of a two-pass approach to annotating metadata, speech effects and syntactic...
End-to-end spoken language translation (SLT) has recently gained popularity thanks to the advancemen...
We present an English–Korean speech translation corpus, named EnKoST-C. End-to-end model training fo...
In this paper, we present improvements made to the TED-LIUM corpus we released in 2012. These enhanc...
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech transl...
Creating syntactic trees for spoken language data involves many challenges. The problems pertain to ...
AbstractThis paper describes the TED Corpus Search Engine (TCSE), an online corpus system that searc...
International audienceThis paper describes a syntactic annotation platform (Contemplata) that integr...
While we have seen significant advances in automatic summarization for text, research on speech summ...
International audienceThis paper describes ODIL Syntax, a French treebank built on spontaneous speec...
This paper presents the corpus developed by the LIUM for Automatic Speech Recognition (ASR), based o...
This document summarizes the major work on discourse processing in the Enthusiast Spanish-English sp...