The informal nature of social media text renders it very difficult to be automati-cally processed by natural language pro-cessing tools. Text normalization, which corresponds to restoring the non-standard words to their canonical forms, provides a solution to this challenge. We introduce an unsupervised text normalization approach that utilizes not only lexical, but also con-textual and grammatical features of social text. The contextual and grammatical fea-tures are extracted from a word association graph built by using a large unlabeled so-cial media text corpus. The graph encodes the relative positions of the words with re-spect to each other, as well as their part-of-speech tags. The lexical features are ob-tained by using the longest c...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
Existing natural language processing systems have often been designed with standard texts in mind. H...
I propose a text normalization model based on learning edit operations from labeled data while incor...
The informal nature of social media text renders it very difficult to be automati-cally processed by...
Social media language contains huge amount and wide variety of nonstandard tokens, cre-ated both int...
In this work, we adapt the traditional framework for spelling correction to the more novel task of n...
MasterNatural Language Processing (NLP) on data from social network services (SNSs) became more diffic...
User generated texts on the web are freely-available and lucrative sources of data for language tech...
Text normalization is an indispensable stage for natural language processing of social media data wi...
This is an accepted manuscript of an article published by IEEE in 2018 3rd International Conference ...
Various models have been developed for normalizing informal text. In this paper, we propose two meth...
is one of the most important data sources in social data analysis. However, the text contained on Tw...
National audienceThe boom of natural language processing (NLP) is taking place in a world where more...
The use of computer mediated communication has resulted in a new form of written text—Microtext—whic...
The ever-growing usage of social media platforms generates daily vast amounts of textual data which ...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
Existing natural language processing systems have often been designed with standard texts in mind. H...
I propose a text normalization model based on learning edit operations from labeled data while incor...
The informal nature of social media text renders it very difficult to be automati-cally processed by...
Social media language contains huge amount and wide variety of nonstandard tokens, cre-ated both int...
In this work, we adapt the traditional framework for spelling correction to the more novel task of n...
MasterNatural Language Processing (NLP) on data from social network services (SNSs) became more diffic...
User generated texts on the web are freely-available and lucrative sources of data for language tech...
Text normalization is an indispensable stage for natural language processing of social media data wi...
This is an accepted manuscript of an article published by IEEE in 2018 3rd International Conference ...
Various models have been developed for normalizing informal text. In this paper, we propose two meth...
is one of the most important data sources in social data analysis. However, the text contained on Tw...
National audienceThe boom of natural language processing (NLP) is taking place in a world where more...
The use of computer mediated communication has resulted in a new form of written text—Microtext—whic...
The ever-growing usage of social media platforms generates daily vast amounts of textual data which ...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
Existing natural language processing systems have often been designed with standard texts in mind. H...
I propose a text normalization model based on learning edit operations from labeled data while incor...