Spoken corpora have traditionally been assembled through careful recording and transcription of discourse events, a process which is both labour intensive and often restrictive in terms of breadth of recording contexts available. To overcome these potential challenges in spoken corpus compilation, we explore the use of crowdsourcing of language samples that are reported by participants. We investigate the level of precision and recall of the ‘crowd’ when it comes to reporting language they have heard in certain contexts, alongside the use of a crowdsourcing toolkit to facilitate this task. As a focussing device for the selection of reported language samples, we draw on the use of formulaic phrases as an area that has received considerable a...
none2noA corpus is a collection of authentic, non-elicited texts selected and assembled to study lan...
This article provides an overview of methodological and technical issues that arise in the collectio...
This article describes and critically examines the challenging task of compiling The London–Lund Cor...
Spoken corpora have traditionally been assembled through careful recording and transcription of disc...
Corpora have revolutionised the way we describe and analyse language in use. The sheer scale of coll...
This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthog...
This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthog...
International audienceText corpora represent the foundation on which most natural language processin...
This paper introduces the Spoken British National Corpus 2014, an 11-million-word corpus of orthogra...
Summarization: We investigate algorithms and tools for the semi-automatic authoring of grammars for ...
Statistical language modelling may not only be used to uncover the patterns which underlie the compo...
This talk reports on the compilation of the new London–Lund Corpus (LLC–2) –a corpus of contemporary...
DIT’s prototype speech corpus allows language learners and researchers access to real, informal dial...
We explore the use of crowdsourcing to generate natural language in spoken dia-logue systems. We int...
Most previous work on trainable language generation has focused on two paradigms: (a) using a statis...
none2noA corpus is a collection of authentic, non-elicited texts selected and assembled to study lan...
This article provides an overview of methodological and technical issues that arise in the collectio...
This article describes and critically examines the challenging task of compiling The London–Lund Cor...
Spoken corpora have traditionally been assembled through careful recording and transcription of disc...
Corpora have revolutionised the way we describe and analyse language in use. The sheer scale of coll...
This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthog...
This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthog...
International audienceText corpora represent the foundation on which most natural language processin...
This paper introduces the Spoken British National Corpus 2014, an 11-million-word corpus of orthogra...
Summarization: We investigate algorithms and tools for the semi-automatic authoring of grammars for ...
Statistical language modelling may not only be used to uncover the patterns which underlie the compo...
This talk reports on the compilation of the new London–Lund Corpus (LLC–2) –a corpus of contemporary...
DIT’s prototype speech corpus allows language learners and researchers access to real, informal dial...
We explore the use of crowdsourcing to generate natural language in spoken dia-logue systems. We int...
Most previous work on trainable language generation has focused on two paradigms: (a) using a statis...
none2noA corpus is a collection of authentic, non-elicited texts selected and assembled to study lan...
This article provides an overview of methodological and technical issues that arise in the collectio...
This article describes and critically examines the challenging task of compiling The London–Lund Cor...