One of the major challenges in the era of big data use is how to 'clean' the vast amount of data, particularly from micro-blog websites like Twitter. Twitter messages, called tweets, are commonly written in ill-forms, including abbreviations, repeated characters, and misspelled words. These 'noisy tweets' require text normalisation techniques to detect and convert them into more accurate English sentences. There are several existing techniques proposed to solve these issues, however each technique possess some limitations and therefore cannot achieve good overall results. This paper aims to evaluate individual existing statistical normalisation methods and their possible combinations in order to find the best combination that can efficientl...
The proliferation of Web 2.0 technologies and the increasing use of computer-mediated communication ...
The use of computer mediated communication has resulted in a new form of written text—Microtext—whic...
This paper describes a text normalization system for deletion-based abbreviations in informal text. ...
One of the major challenges in the era of big data use is how to 'clean' the vast amount of data, pa...
One of the major challenges in the era of big data use is how to ‘clean’ the vast amount of data, pa...
One of the major problems in the era of big data use is how to ‘clean’ the vast amount of data on th...
is one of the most important data sources in social data analysis. However, the text contained on Tw...
While parsing performance on in-domain text has developed steadily in recent years, out-of-domain te...
In this article we describe the microtext normalization system we have used to par-ticipate in the N...
Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and oth...
The ever-growing usage of social media platforms generates daily vast amounts of textual data which ...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
The language used in social media is often characterized by the abundance of informal and non-standa...
Social media platforms such as Twitter have grown at a tremendous pace in recent years and have beco...
The language used in social media is often characterized by the abundance of informal and non-standa...
The proliferation of Web 2.0 technologies and the increasing use of computer-mediated communication ...
The use of computer mediated communication has resulted in a new form of written text—Microtext—whic...
This paper describes a text normalization system for deletion-based abbreviations in informal text. ...
One of the major challenges in the era of big data use is how to 'clean' the vast amount of data, pa...
One of the major challenges in the era of big data use is how to ‘clean’ the vast amount of data, pa...
One of the major problems in the era of big data use is how to ‘clean’ the vast amount of data on th...
is one of the most important data sources in social data analysis. However, the text contained on Tw...
While parsing performance on in-domain text has developed steadily in recent years, out-of-domain te...
In this article we describe the microtext normalization system we have used to par-ticipate in the N...
Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and oth...
The ever-growing usage of social media platforms generates daily vast amounts of textual data which ...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
The language used in social media is often characterized by the abundance of informal and non-standa...
Social media platforms such as Twitter have grown at a tremendous pace in recent years and have beco...
The language used in social media is often characterized by the abundance of informal and non-standa...
The proliferation of Web 2.0 technologies and the increasing use of computer-mediated communication ...
The use of computer mediated communication has resulted in a new form of written text—Microtext—whic...
This paper describes a text normalization system for deletion-based abbreviations in informal text. ...