International audienceLanguage model-based pre-trained representations have become ubiquitous in natural language processing. They have been shown to significantly improve the performance of neu-ral models on a great variety of tasks. However , it remains unclear how useful those general models can be in handling non-canonical text. In this article, focusing on User Generated Content (UGC) in a resource-scarce scenario , we study the ability of BERT (Devlin et al., 2018) to perform lexical normalisation. Our contribution is simple: by framing lexical normalisation as a token prediction task, by enhancing its architecture and by carefully fine-tuning it, we show that BERT can be a competitive lexical normalisation model without the need of a...
Various models have been developed for normalizing informal text. In this paper, we propose two meth...
The automatic analysis (parsing) of natural language is an important ingredient for many natural lan...
This work explores normalization forparser adaptation. Traditionally, normalizationis used as separa...
International audienceLanguage model-based pre-trained representations have become ubiquitous in nat...
With the emergence of Social media and its growing popularity, there has been substantial growth in ...
One of the most persistent characteristics of written user-generated content (UGC) is the use of non...
In this work we present a taxonomy of error categories for lexical normalization, which is the task ...
National audienceThe boom of natural language processing (NLP) is taking place in a world where more...
In this paper we present a Dutch and English dataset that can serve as a gold standard for evaluatin...
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) en...
We present research aiming to build tools for the normalization of User-Generated Content (UGC). We ...
This paper describes the HEL-LJU submissions to the MultiLexNorm shared task on multilingual lexical...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
Various models have been developed for normalizing informal text. In this paper, we propose two meth...
The automatic analysis (parsing) of natural language is an important ingredient for many natural lan...
This work explores normalization forparser adaptation. Traditionally, normalizationis used as separa...
International audienceLanguage model-based pre-trained representations have become ubiquitous in nat...
With the emergence of Social media and its growing popularity, there has been substantial growth in ...
One of the most persistent characteristics of written user-generated content (UGC) is the use of non...
In this work we present a taxonomy of error categories for lexical normalization, which is the task ...
National audienceThe boom of natural language processing (NLP) is taking place in a world where more...
In this paper we present a Dutch and English dataset that can serve as a gold standard for evaluatin...
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) en...
We present research aiming to build tools for the normalization of User-Generated Content (UGC). We ...
This paper describes the HEL-LJU submissions to the MultiLexNorm shared task on multilingual lexical...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
Various models have been developed for normalizing informal text. In this paper, we propose two meth...
The automatic analysis (parsing) of natural language is an important ingredient for many natural lan...
This work explores normalization forparser adaptation. Traditionally, normalizationis used as separa...