Information retrieval systems rely heavily on models of similarity, but for spoken dialog such models currently use mostly standard textual-content similarity. As part of the MediaEval Benchmarking Initiative, we have created a new corpus to support development of similarity models for spoken dialog. This corpus includes 26 casual dialogs among members of two semi-cohesive groups, totaling about 5 hours, with 1889 labeled regions associated into 227 sets which annotators judged to be similar enough to share a tag. This technical report brings together information about this corpus and its intended uses, previously only available on the project website
Often users of information retrieval systems and document authors use different terms to refer to th...
Comparable corpora usually comprise a limited number of situation types, mostly defined by simplisti...
This article describes a methodology for collecting text from the Web to match a target sublanguage ...
Information retrieval systems rely heavily on models of similarity, but for spoken dialog such model...
Existing Conversational Agents (CAs) have several disadvantages. The most serious is that the CAs th...
The Semantic textual similarity (STS) task is commonly used to evaluate the semantic representations...
Standardized corpora are the foundation for spoken language research. In this work, we introduce an ...
Empirical spoken dialog research often involves the collection and analysis of a dialog corpus. Howe...
International audienceThis study presents a method for measuring speakers similarity (the tendency f...
While many aspects of speech processing, including speech recognition and speech synthesis, have see...
We present a novel speech corpus collected with the primary aim of facilitating research in speaker ...
This research addresses the problem of deriving semantic similarity between words of language using ...
International audienceThis paper investigates the use of recurrent surface text patterns to represen...
This paper describes the Natural Language Engineering and Pattern Recognition group (ELiRF) approac...
This article discusses the detection of discourse markers (DM) in dialog transcriptions, by human an...
Often users of information retrieval systems and document authors use different terms to refer to th...
Comparable corpora usually comprise a limited number of situation types, mostly defined by simplisti...
This article describes a methodology for collecting text from the Web to match a target sublanguage ...
Information retrieval systems rely heavily on models of similarity, but for spoken dialog such model...
Existing Conversational Agents (CAs) have several disadvantages. The most serious is that the CAs th...
The Semantic textual similarity (STS) task is commonly used to evaluate the semantic representations...
Standardized corpora are the foundation for spoken language research. In this work, we introduce an ...
Empirical spoken dialog research often involves the collection and analysis of a dialog corpus. Howe...
International audienceThis study presents a method for measuring speakers similarity (the tendency f...
While many aspects of speech processing, including speech recognition and speech synthesis, have see...
We present a novel speech corpus collected with the primary aim of facilitating research in speaker ...
This research addresses the problem of deriving semantic similarity between words of language using ...
International audienceThis paper investigates the use of recurrent surface text patterns to represen...
This paper describes the Natural Language Engineering and Pattern Recognition group (ELiRF) approac...
This article discusses the detection of discourse markers (DM) in dialog transcriptions, by human an...
Often users of information retrieval systems and document authors use different terms to refer to th...
Comparable corpora usually comprise a limited number of situation types, mostly defined by simplisti...
This article describes a methodology for collecting text from the Web to match a target sublanguage ...