Natural Language Generation (NLG) for non-English languages is hampered by the scarcity of datasets in these languages. In this paper, we present the IndicNLG Benchmark, a collection of datasets for benchmarking NLG for 11 Indic languages. We focus on five diverse tasks, namely, biography generation using Wikipedia infoboxes, news headline generation, sentence summarization, paraphrase generation and, question generation. We describe the created datasets and use them to benchmark the performance of several monolingual and multilingual baselines that leverage pre-trained sequence-to-sequence models. Our results exhibit the strong performance of multilingual language-specific pre-trained models, and the utility of models trained on our datase...
International audienceWe introduce GEM, a living benchmark for natural language Generation (NLG), it...
The recent advances in deep-learning have led to the development of highly sophisticated systems wit...
NLP technologies are uneven for the world's languages as the state-of-the-art models are only availa...
In this work, we introduce IndicXTREME, a benchmark consisting of nine diverse tasks covering 18 lan...
In this paper, we study pre-trained sequence-to-sequence models for a group of related languages, wi...
We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It...
A cornerstone in AI research has been the creation and adoption of standardized training and test da...
Multilingual evaluation benchmarks usually contain limited high-resource languages and do not test m...
We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 lang...
Availability of challenging benchmarks is the key to advancement of AI in a specific field.Since Leg...
We would like to thank Google’s TPU Research Cloud program for providing us with free and unlimited ...
Natural language processing (NLP) has a significant impact on society via technologies such as machi...
One of the biggest challenges of natural language generation (NLG) is the proper handling of named e...
In order for NLP technology to be widely applicable and useful, it needs to be inclusive of users ac...
A large percentage of the world’s population speaks a language of the Indian subcontinent, what we w...
International audienceWe introduce GEM, a living benchmark for natural language Generation (NLG), it...
The recent advances in deep-learning have led to the development of highly sophisticated systems wit...
NLP technologies are uneven for the world's languages as the state-of-the-art models are only availa...
In this work, we introduce IndicXTREME, a benchmark consisting of nine diverse tasks covering 18 lan...
In this paper, we study pre-trained sequence-to-sequence models for a group of related languages, wi...
We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It...
A cornerstone in AI research has been the creation and adoption of standardized training and test da...
Multilingual evaluation benchmarks usually contain limited high-resource languages and do not test m...
We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 lang...
Availability of challenging benchmarks is the key to advancement of AI in a specific field.Since Leg...
We would like to thank Google’s TPU Research Cloud program for providing us with free and unlimited ...
Natural language processing (NLP) has a significant impact on society via technologies such as machi...
One of the biggest challenges of natural language generation (NLG) is the proper handling of named e...
In order for NLP technology to be widely applicable and useful, it needs to be inclusive of users ac...
A large percentage of the world’s population speaks a language of the Indian subcontinent, what we w...
International audienceWe introduce GEM, a living benchmark for natural language Generation (NLG), it...
The recent advances in deep-learning have led to the development of highly sophisticated systems wit...
NLP technologies are uneven for the world's languages as the state-of-the-art models are only availa...