Given the growing need to quickly process texts and extract information from the data for various purposes, correct normalization that will contribute to better and faster processing is of great importance. The paper presents the comparison of different methods of short text (tweet) normalization. The comparison is illustrated by the example of text sentiment analysis. The results of an application of different normalizations are presented, taking into account time complexity and sentiment algorithm classification accuracy. It has been shown that using cutting to n-gram normalization, better or similar results are obtained compared to language-dependent normalizations. Including the time complexity, it is concluded that the application of...
Social media texts have become one of the most used forms of written language and a valuable source ...
As social media constitute a valuable source for data analysis for a wide range of applications, the...
The language used in social media is often characterized by the abundance of informal and non-standa...
The proliferation of Web 2.0 technologies and the increasing use of computer-mediated communication ...
In this paper we discuss the parallel manual normalisation of samples extracted from Croatian and Se...
Sentiment analysis in the most general sense refers to the classification of a piece of text into ei...
In the past decade, sentiment analysis research has thrived, especially on social media. While this ...
The language used in social media is often characterized by the abundance of informal and non-standa...
The writing style used in social media usually contains informal elements that can lower the perform...
is one of the most important data sources in social data analysis. However, the text contained on Tw...
The ever-growing usage of social media platforms generates daily vast amounts of textual data which ...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
One of the major challenges in the era of big data use is how to 'clean' the vast amount of data, pa...
One of the major challenges in the era of big data use is how to ‘clean’ the vast amount of data, pa...
User generated texts on the web are freely-available and lucrative sources of data for language tech...
Social media texts have become one of the most used forms of written language and a valuable source ...
As social media constitute a valuable source for data analysis for a wide range of applications, the...
The language used in social media is often characterized by the abundance of informal and non-standa...
The proliferation of Web 2.0 technologies and the increasing use of computer-mediated communication ...
In this paper we discuss the parallel manual normalisation of samples extracted from Croatian and Se...
Sentiment analysis in the most general sense refers to the classification of a piece of text into ei...
In the past decade, sentiment analysis research has thrived, especially on social media. While this ...
The language used in social media is often characterized by the abundance of informal and non-standa...
The writing style used in social media usually contains informal elements that can lower the perform...
is one of the most important data sources in social data analysis. However, the text contained on Tw...
The ever-growing usage of social media platforms generates daily vast amounts of textual data which ...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
One of the major challenges in the era of big data use is how to 'clean' the vast amount of data, pa...
One of the major challenges in the era of big data use is how to ‘clean’ the vast amount of data, pa...
User generated texts on the web are freely-available and lucrative sources of data for language tech...
Social media texts have become one of the most used forms of written language and a valuable source ...
As social media constitute a valuable source for data analysis for a wide range of applications, the...
The language used in social media is often characterized by the abundance of informal and non-standa...