With the emergence of Social media and its growing popularity, there has been substantial growth in User Generated Content (UGC), which holds great potential in extracting meaningful information. Due to the dynamic nature of social media contents, many Natural Language Processing (NLP) systems have suffered from performance degradation due to the original intention in development for application to standard data. To resolve this significant drop in performance, normalization of non-standard data was introduced as a pre-processing step for processing non-standard texts before being applied to these downstream tasks. This thesis focuses on investigating the incorporation of the pre-trained language model BERT in normalization and the varying ...
National audienceThe boom of natural language processing (NLP) is taking place in a world where more...
User generated texts on the web are freely-available and lucrative sources of data for language tech...
We present research aiming to build tools for the normalization of User-Generated Content (UGC). We ...
With the emergence of Social media and its growing popularity, there has been substantial growth in ...
International audienceLanguage model-based pre-trained representations have become ubiquitous in nat...
As social media constitute a valuable source for data analysis for a wide range of applications, the...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
In this work we present a taxonomy of error categories for lexical normalization, which is the task ...
Social media texts have become one of the most used forms of written language and a valuable source ...
This is an accepted manuscript of an article published by IEEE in 2018 3rd International Conference ...
The automatic analysis (parsing) of natural language is an important ingredient for many natural lan...
Existing natural language processing systems have often been designed with standard texts in mind. H...
This work explores normalization forparser adaptation. Traditionally, normalizationis used as separa...
Text-to-Speech (TTS) normalization is an essential component of natural language processing (NLP) th...
National audienceThe boom of natural language processing (NLP) is taking place in a world where more...
User generated texts on the web are freely-available and lucrative sources of data for language tech...
We present research aiming to build tools for the normalization of User-Generated Content (UGC). We ...
With the emergence of Social media and its growing popularity, there has been substantial growth in ...
International audienceLanguage model-based pre-trained representations have become ubiquitous in nat...
As social media constitute a valuable source for data analysis for a wide range of applications, the...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
In this work we present a taxonomy of error categories for lexical normalization, which is the task ...
Social media texts have become one of the most used forms of written language and a valuable source ...
This is an accepted manuscript of an article published by IEEE in 2018 3rd International Conference ...
The automatic analysis (parsing) of natural language is an important ingredient for many natural lan...
Existing natural language processing systems have often been designed with standard texts in mind. H...
This work explores normalization forparser adaptation. Traditionally, normalizationis used as separa...
Text-to-Speech (TTS) normalization is an essential component of natural language processing (NLP) th...
National audienceThe boom of natural language processing (NLP) is taking place in a world where more...
User generated texts on the web are freely-available and lucrative sources of data for language tech...
We present research aiming to build tools for the normalization of User-Generated Content (UGC). We ...