ChatGPT has demonstrated impressive performance in various downstream tasks. However, in the Chinese Spelling Correction (CSC) task, we observe a discrepancy: while ChatGPT performs well under human evaluation, it scores poorly according to traditional metrics. We believe this inconsistency arises because the traditional metrics are not well-suited for evaluating generative models. Their overly strict length and phonics constraints may lead to underestimating ChatGPT's correction capabilities. To better evaluate generative models in the CSC task, this paper proposes a new evaluation metric: Eval-GCSC. By incorporating word-level and semantic similarity judgments, it relaxes the stringent length and phonics constraints. Experimental results ...
The recent success of large language models (LLMs) has shown great potential to develop more powerfu...
This report provides a preliminary evaluation of ChatGPT for machine translation, including translat...
The ubiquitous adoption of Large Language Generation Models (LLMs) in programming has underscored th...
Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Pr...
Pretraining-based (PT-based) automatic evaluation metrics (e.g., BERTScore and BARTScore) have been ...
The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recen...
The task of Chinese Spelling Check (CSC) is crucial for identifying and rectifying spelling errors i...
Chinese spelling check (CSC) is still an open problem today. To the best of our knowledge, language ...
Chinese Grammatical Error Correction (CGEC) aims to generate a correct sentence from an erroneous se...
In this paper, we uncover a systematic bias in the evaluation paradigm of adopting large language mo...
The emergence of ChatGPT has generated much speculation in the press about its potential to disrupt ...
The lack of label data is one of the significant bottlenecks for Chinese Spelling Check (CSC). Exist...
This preliminary study consisted of two experiments. The first aimed to gauge the translation qualit...
Chinese Spelling Correction (CSC) is a task to detect and correct spelling mistakes in texts. In fac...
In recent years, Chinese Spelling Check (CSC) has been greatly improved by designing task-specific p...
The recent success of large language models (LLMs) has shown great potential to develop more powerfu...
This report provides a preliminary evaluation of ChatGPT for machine translation, including translat...
The ubiquitous adoption of Large Language Generation Models (LLMs) in programming has underscored th...
Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Pr...
Pretraining-based (PT-based) automatic evaluation metrics (e.g., BERTScore and BARTScore) have been ...
The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recen...
The task of Chinese Spelling Check (CSC) is crucial for identifying and rectifying spelling errors i...
Chinese spelling check (CSC) is still an open problem today. To the best of our knowledge, language ...
Chinese Grammatical Error Correction (CGEC) aims to generate a correct sentence from an erroneous se...
In this paper, we uncover a systematic bias in the evaluation paradigm of adopting large language mo...
The emergence of ChatGPT has generated much speculation in the press about its potential to disrupt ...
The lack of label data is one of the significant bottlenecks for Chinese Spelling Check (CSC). Exist...
This preliminary study consisted of two experiments. The first aimed to gauge the translation qualit...
Chinese Spelling Correction (CSC) is a task to detect and correct spelling mistakes in texts. In fac...
In recent years, Chinese Spelling Check (CSC) has been greatly improved by designing task-specific p...
The recent success of large language models (LLMs) has shown great potential to develop more powerfu...
This report provides a preliminary evaluation of ChatGPT for machine translation, including translat...
The ubiquitous adoption of Large Language Generation Models (LLMs) in programming has underscored th...