Multilingual evaluation benchmarks usually contain limited high-resource languages and do not test models for specific linguistic capabilities. CheckList is a template-based evaluation approach that tests models for specific capabilities. The CheckList template creation process requires native speakers, posing a challenge in scaling to hundreds of languages. In this work, we explore multiple approaches to generate Multilingual CheckLists. We device an algorithm - Template Extraction Algorithm (TEA) for automatically extracting target language CheckList templates from machine translated instances of a source language templates. We compare the TEA CheckLists with CheckLists created with different levels of human intervention. We further intro...
In this paper, we study pre-trained sequence-to-sequence models for a group of related languages, wi...
In this paper, we introduce DOCmT5, a multilingual sequence-to-sequence language model pretrained wi...
This research investigates the effectiveness of ChatGPT, an AI language model by OpenAI, in translat...
Large Language Models (LLMs) have demonstrated impressive performance on Natural Language Processing...
In this work, we introduce IndicXTREME, a benchmark consisting of nine diverse tasks covering 18 lan...
We present a systematic study and comprehensive evaluation of large language models for automatic mu...
Natural Language Generation (NLG) for non-English languages is hampered by the scarcity of datasets ...
We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It...
We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 lang...
Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 10...
While understanding and removing gender biases in language models has been a long-standing problem i...
Recent benchmarks for Large Language Models (LLMs) have mostly focused on application-driven tasks s...
Current research on automatic readability assessment (ARA) has focused on improving the performance ...
A cornerstone in AI research has been the creation and adoption of standardized training and test da...
We introduce MADLAD-400, a manually audited, general domain 3T token monolingual dataset based on Co...
In this paper, we study pre-trained sequence-to-sequence models for a group of related languages, wi...
In this paper, we introduce DOCmT5, a multilingual sequence-to-sequence language model pretrained wi...
This research investigates the effectiveness of ChatGPT, an AI language model by OpenAI, in translat...
Large Language Models (LLMs) have demonstrated impressive performance on Natural Language Processing...
In this work, we introduce IndicXTREME, a benchmark consisting of nine diverse tasks covering 18 lan...
We present a systematic study and comprehensive evaluation of large language models for automatic mu...
Natural Language Generation (NLG) for non-English languages is hampered by the scarcity of datasets ...
We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It...
We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 lang...
Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 10...
While understanding and removing gender biases in language models has been a long-standing problem i...
Recent benchmarks for Large Language Models (LLMs) have mostly focused on application-driven tasks s...
Current research on automatic readability assessment (ARA) has focused on improving the performance ...
A cornerstone in AI research has been the creation and adoption of standardized training and test da...
We introduce MADLAD-400, a manually audited, general domain 3T token monolingual dataset based on Co...
In this paper, we study pre-trained sequence-to-sequence models for a group of related languages, wi...
In this paper, we introduce DOCmT5, a multilingual sequence-to-sequence language model pretrained wi...
This research investigates the effectiveness of ChatGPT, an AI language model by OpenAI, in translat...