Visually-grounded dialog systems, which integrate multiple modes of communication such as text and visual inputs, have become an increasingly popular area of investigation. However, the absence of a standardized evaluation framework poses a challenge in assessing the development of this field. To this end, we propose \textbf{VDialogUE}, a \textbf{V}isually-grounded \textbf{Dialog}ue benchmark for \textbf{U}nified \textbf{E}valuation. It defines five core multi-modal dialogue tasks and covers six datasets. Furthermore, in order to provide a comprehensive assessment of the model's performance across all tasks, we developed a novel evaluation metric called VDscore, which is based on the Analytic Hierarchy Process~(AHP) method. Additionally, we...
Building a universal conversational agent has been a long-standing goal of the dialogue research com...
In this paper, the semantic and pragmatic modules of a spoken dialogue system development platform a...
Having an intelligent assistant that can communicate with humans to serve their needs is a fundament...
The Visual Dialog task requires a model to exploit both image and conversational context information...
The Metalogue project aims to develop a multi-modal, multi-party dialogue system with metacognitive ...
The goal of this paper is to define a methodology for the end-to-end evaluation of the multimodal di...
Evaluating generation systems ● Generation is an open-ended task – Like “counting from zero to infin...
The goal of this paper is to define a methodology for the end-to-end evaluation of the multimodal di...
International audienceThe process of "conversational grounding" is an interactive process that has b...
The Metalogue project aims to develop a multi-modal, multi-party dialogue system with metacognitive ...
Practical dialog systems need to deal with various knowledge sources, noisy user expressions, and th...
A well-designed interactive human-like dialogue system is expected to take actions (e.g. smiling) an...
Despite recent progress in open-domain dialogue evaluation, how to develop automatic metrics remains...
UnrestrictedSpoken dialogue systems -- computers that interact with humans through spoken conversati...
Image-grounded dialogue systems benefit greatly from integrating visual information, resulting in hi...
Building a universal conversational agent has been a long-standing goal of the dialogue research com...
In this paper, the semantic and pragmatic modules of a spoken dialogue system development platform a...
Having an intelligent assistant that can communicate with humans to serve their needs is a fundament...
The Visual Dialog task requires a model to exploit both image and conversational context information...
The Metalogue project aims to develop a multi-modal, multi-party dialogue system with metacognitive ...
The goal of this paper is to define a methodology for the end-to-end evaluation of the multimodal di...
Evaluating generation systems ● Generation is an open-ended task – Like “counting from zero to infin...
The goal of this paper is to define a methodology for the end-to-end evaluation of the multimodal di...
International audienceThe process of "conversational grounding" is an interactive process that has b...
The Metalogue project aims to develop a multi-modal, multi-party dialogue system with metacognitive ...
Practical dialog systems need to deal with various knowledge sources, noisy user expressions, and th...
A well-designed interactive human-like dialogue system is expected to take actions (e.g. smiling) an...
Despite recent progress in open-domain dialogue evaluation, how to develop automatic metrics remains...
UnrestrictedSpoken dialogue systems -- computers that interact with humans through spoken conversati...
Image-grounded dialogue systems benefit greatly from integrating visual information, resulting in hi...
Building a universal conversational agent has been a long-standing goal of the dialogue research com...
In this paper, the semantic and pragmatic modules of a spoken dialogue system development platform a...
Having an intelligent assistant that can communicate with humans to serve their needs is a fundament...