We present research aiming to build tools for the normalization of User-Generated Content (UGC). We argue that processing this type of text requires the revisiting of the initial steps of Natural Language Processing, since UGC (micro-blog, blog, and, generally, Web 2.0 user-generated texts) presents a number of nonstandard communicative and linguistic characteristics – often closer to oral and colloquial language than to edited text. We present a corpus of UGC text in Spanish from three different sources: Twitter, consumer reviews, and blogs, and describe its main characteristics. We motivate the need for UGC text normalization by analyzing the problems found when processing this type of text through a conventional language processing pipel...
The writing style used in social media usually contains informal elements that can lower the perform...
Conteúdo Gerado por Usuário (CGU) é a denominação dada ao conteúdo criado de forma espontânea por in...
In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we...
We present research aiming to build tools for the normalization of User-Generated Content (UGC). We ...
We present work in progress aiming to build tools for the normalization of User-Generated Content (U...
We present work in progress aiming to build tools for the normalization of User-Generated Content (U...
The language used in social media is often characterized by the abundance of informal and non-standa...
The language used in social media is often characterized by the abundance of informal and non-standa...
This paper presents a system to normalize Spanish tweets, which uses preprocessing rules, a domain-a...
In this article we describe the microtext normalization system we have used to par-ticipate in the N...
National audienceThe boom of natural language processing (NLP) is taking place in a world where more...
User-generated content has become a re-current resource for NLP tools and ap-plications, hence many ...
User-generated contents (UGC) represent an important source of information for governments, companie...
International audienceWe present a system to normalize Spanish tweets, which uses preprocessing rule...
Text normalization is an indispensable stage for natural language processing of social media data wi...
The writing style used in social media usually contains informal elements that can lower the perform...
Conteúdo Gerado por Usuário (CGU) é a denominação dada ao conteúdo criado de forma espontânea por in...
In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we...
We present research aiming to build tools for the normalization of User-Generated Content (UGC). We ...
We present work in progress aiming to build tools for the normalization of User-Generated Content (U...
We present work in progress aiming to build tools for the normalization of User-Generated Content (U...
The language used in social media is often characterized by the abundance of informal and non-standa...
The language used in social media is often characterized by the abundance of informal and non-standa...
This paper presents a system to normalize Spanish tweets, which uses preprocessing rules, a domain-a...
In this article we describe the microtext normalization system we have used to par-ticipate in the N...
National audienceThe boom of natural language processing (NLP) is taking place in a world where more...
User-generated content has become a re-current resource for NLP tools and ap-plications, hence many ...
User-generated contents (UGC) represent an important source of information for governments, companie...
International audienceWe present a system to normalize Spanish tweets, which uses preprocessing rule...
Text normalization is an indispensable stage for natural language processing of social media data wi...
The writing style used in social media usually contains informal elements that can lower the perform...
Conteúdo Gerado por Usuário (CGU) é a denominação dada ao conteúdo criado de forma espontânea por in...
In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we...