Repurposing Corpora for Speech Repair Detection: Two Experiments

Simon Zwarts
Mark Johnson
Robert Dale

Publication date

October 2015

Abstract

Unrehearsed spoken language often contains many disfluencies. If we want to correctly interpret the content of spoken language, we need to be able to detect these disfluencies and deal with them appropriately. In the work de-scribed here, we use a statistical noisy channel model to detect disfluencies in transcripts of spoken language. Like all statistical approaches, this is natu-rally very data-hungry; however, cor-pora containing transcripts of unre-hearsed spoken language with disflu-encies annotated are a scarce resource, which makes training difficult. We address this issue in the follow-ing ways: First, since written textual corpora are much more abundant than speech corpora, we see whether using a large text corpus to increase the d...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Repurposing Corpora for Speech Repair Detection: Two Experiments

Abstract

Extracted data

Repurposing Corpora for Speech Repair Detection: Two Experiments

Abstract

Extracted data

Related items

Related items