We address the problem of the categorization of short texts, like those posted by users on social networks and microblogging platforms. We specifically focus on Twitter. Since short texts do not provide sufficient word occurrences, and they often contain abbreviations and acronyms, traditional classification methods such as "Bag-of-Words" have limitations. Our proposed method enriches the original text with a new set of words, to add more semantic value by using information extracted from webpages of the same temporal context. Then we use those words to query Wikipedia, as an external knowledge base, with the final goal to categorize the original text using a predefined set of Wikipedia categories. We also present a first experimental evalu...
Two challenging issues are notable in tweet clustering. Firstly, the sparse data problem is serious ...
Classification of short text messages is becoming more and more relevant in these years, where billi...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
We focus on entity extraction and disambiguation in short text communications, which have experience...
With the huge growth of social media, especially with 500 million Twitter messages being posted per ...
ii In micro-blogging services such as Twitter, the users may get overwhelmed by the raw data. One so...
Short texts, due to their nature which makes them full of abbreviations and new coined acronyms, are...
Twitter is a microblogging service that allows people to communicate via messages containing only 14...
AbstractIn this paper, we propose a novel approach to classify short texts by combining both their l...
With each passing minute, online data is growing exponentially. A bulk of such data is generated fro...
Understanding short texts is crucial to many applications, but challenges abound. First, short texts...
The process whereby inferences are made from textual data is broadly referred to as text mining. In ...
The content of the World Wide Web is drastically multiplying, and thus the amount of available onlin...
Abstract Classifying short texts to one category or clustering semantically related texts is challen...
International audienceThe tweet contextualization task aims at providing an automatic readable summa...
Two challenging issues are notable in tweet clustering. Firstly, the sparse data problem is serious ...
Classification of short text messages is becoming more and more relevant in these years, where billi...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
We focus on entity extraction and disambiguation in short text communications, which have experience...
With the huge growth of social media, especially with 500 million Twitter messages being posted per ...
ii In micro-blogging services such as Twitter, the users may get overwhelmed by the raw data. One so...
Short texts, due to their nature which makes them full of abbreviations and new coined acronyms, are...
Twitter is a microblogging service that allows people to communicate via messages containing only 14...
AbstractIn this paper, we propose a novel approach to classify short texts by combining both their l...
With each passing minute, online data is growing exponentially. A bulk of such data is generated fro...
Understanding short texts is crucial to many applications, but challenges abound. First, short texts...
The process whereby inferences are made from textual data is broadly referred to as text mining. In ...
The content of the World Wide Web is drastically multiplying, and thus the amount of available onlin...
Abstract Classifying short texts to one category or clustering semantically related texts is challen...
International audienceThe tweet contextualization task aims at providing an automatic readable summa...
Two challenging issues are notable in tweet clustering. Firstly, the sparse data problem is serious ...
Classification of short text messages is becoming more and more relevant in these years, where billi...
When humans approach the task of text categorization, they interpret the specific wording of the doc...