Compared to the edited genres that have played a central role in NLP research, mi-croblog texts use a more informal register with nonstandard lexical items, abbreviations, and free orthographic variation. When confronted with such input, conventional text analysis tools often perform poorly. Normalization — replacing orthographically or lexically id-iosyncratic forms with more standard variants — can improve performance. We propose a method for learning normalization rules from machine translations of a parallel corpus of microblog messages. To validate the utility of our approach, we evaluate extrinsically, show-ing that normalizing English tweets and then translating improves translation quality (com-pared to translating unnormalized text...
Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-ge...
Microblogs have recently received widespread interest from NLP re-searchers. However, current tools ...
Spelling normalization is the task to normalize non-standard words into standard words in texts, res...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
The use of computer mediated communication has resulted in a new form of written text—Microtext—whic...
In this article we describe the microtext normalization system we have used to par-ticipate in the N...
User-generated content has become a re-current resource for NLP tools and ap-plications, hence many ...
Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and oth...
The expeditious spread of blogs, microblogs, and social network services has led to accelerate the u...
Short Messaging Service (SMS) texts be-have quite differently from normal written texts and have som...
Social media texts have become one of the most used forms of written language and a valuable source ...
The automatic analysis (parsing) of natural language is an important ingredient for many natural lan...
Social media language contains huge amount and wide variety of nonstandard tokens, cre-ated both int...
I propose a text normalization model based on learning edit operations from labeled data while incor...
The proliferation of Web 2.0 technologies and the increasing use of computer-mediated communication ...
Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-ge...
Microblogs have recently received widespread interest from NLP re-searchers. However, current tools ...
Spelling normalization is the task to normalize non-standard words into standard words in texts, res...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
The use of computer mediated communication has resulted in a new form of written text—Microtext—whic...
In this article we describe the microtext normalization system we have used to par-ticipate in the N...
User-generated content has become a re-current resource for NLP tools and ap-plications, hence many ...
Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and oth...
The expeditious spread of blogs, microblogs, and social network services has led to accelerate the u...
Short Messaging Service (SMS) texts be-have quite differently from normal written texts and have som...
Social media texts have become one of the most used forms of written language and a valuable source ...
The automatic analysis (parsing) of natural language is an important ingredient for many natural lan...
Social media language contains huge amount and wide variety of nonstandard tokens, cre-ated both int...
I propose a text normalization model based on learning edit operations from labeled data while incor...
The proliferation of Web 2.0 technologies and the increasing use of computer-mediated communication ...
Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-ge...
Microblogs have recently received widespread interest from NLP re-searchers. However, current tools ...
Spelling normalization is the task to normalize non-standard words into standard words in texts, res...