Practical dialog systems need to deal with various knowledge sources, noisy user expressions, and the shortage of annotated data. To better solve the above problems, we propose CGoDial, new challenging and comprehensive Chinese benchmark for multi-domain Goal-oriented Dialog evaluation. It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources: 1) a slot-based dialog (SBD) dataset with table-formed knowledge, 2) a flow-based dialog (FBD) dataset with tree-formed knowledge, and a retrieval-based dialog (RBD) dataset with candidate-formed knowledge. To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add...
The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recen...
The recent success of large language models (LLMs) has shown great potential to develop more powerfu...
Despite recent progress in open-domain dialogue evaluation, how to develop automatic metrics remains...
Large-scale pre-training has shown remarkable performance in building open-domain dialogue systems. ...
Conversational Recommender System (CRS), which aims to recommend high-quality items to users through...
Realizing general-purpose language intelligence has been a longstanding goal for natural language pr...
Dialogue systems and large language models (LLMs) have gained considerable attention. However, the d...
Building a universal conversational agent has been a long-standing goal of the dialogue research com...
Pre-trained language models (PrLMs) have achieved great success on a wide range of natural language ...
Research on (multi-domain) task-oriented dialog (TOD) has predominantly focused on the English langu...
Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, cu...
Recent advancements in reference-free learned metrics for open-domain dialogue evaluation have been ...
With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's cap...
Visually-grounded dialog systems, which integrate multiple modes of communication such as text and v...
The goal of information-seeking dialogue is to respond to seeker queries with natural language utter...
The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recen...
The recent success of large language models (LLMs) has shown great potential to develop more powerfu...
Despite recent progress in open-domain dialogue evaluation, how to develop automatic metrics remains...
Large-scale pre-training has shown remarkable performance in building open-domain dialogue systems. ...
Conversational Recommender System (CRS), which aims to recommend high-quality items to users through...
Realizing general-purpose language intelligence has been a longstanding goal for natural language pr...
Dialogue systems and large language models (LLMs) have gained considerable attention. However, the d...
Building a universal conversational agent has been a long-standing goal of the dialogue research com...
Pre-trained language models (PrLMs) have achieved great success on a wide range of natural language ...
Research on (multi-domain) task-oriented dialog (TOD) has predominantly focused on the English langu...
Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, cu...
Recent advancements in reference-free learned metrics for open-domain dialogue evaluation have been ...
With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's cap...
Visually-grounded dialog systems, which integrate multiple modes of communication such as text and v...
The goal of information-seeking dialogue is to respond to seeker queries with natural language utter...
The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recen...
The recent success of large language models (LLMs) has shown great potential to develop more powerfu...
Despite recent progress in open-domain dialogue evaluation, how to develop automatic metrics remains...