© 2014 Dr. Bo HanSocial media has been an attractive target for many natural language processing (NLP) tasks and applications in recent years. However, the unprecedented volume of data and the non-standard language register cause problems for off-the-shelf NLP tools. This thesis investigates the broad question of how NLP-based text processing can improve the utility (i.e., the effectiveness and efficiency) of social media data. In particular, text normalisation and geolocation prediction are closely examined in the context of Twitter text processing. Text normalisation is the task of restoring non-standard words to their standard forms. For instance, earthquick and 2morrw should be transformed into “earthquake” and “tomorrow”, respective...
Social media texts have become one of the most used forms of written language and a valuable source ...
As social media constitute a valuable source for data analysis for a wide range of applications, the...
The language used in social media is often characterized by the abundance of informal and non-standa...
Social media language contains huge amount and wide variety of nonstandard tokens, cre-ated both int...
is one of the most important data sources in social data analysis. However, the text contained on Tw...
The language used in social media is often characterized by the abundance of informal and non-standa...
Text normalization is an indispensable stage for natural language processing of social media data wi...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
One of the major problems in the era of big data use is how to ‘clean’ the vast amount of data on th...
Social media such as Twitter, Reddit, and Facebook, have become de facto global communication channe...
Geographical location is vital to geospatial applications like local search and event detection. In ...
This is an accepted manuscript of an article published by IEEE in 2018 3rd International Conference ...
Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and oth...
Inferring the location of a user has been a valuable step for many applications that leverage social...
The ever-growing usage of social media platforms generates daily vast amounts of textual data which ...
Social media texts have become one of the most used forms of written language and a valuable source ...
As social media constitute a valuable source for data analysis for a wide range of applications, the...
The language used in social media is often characterized by the abundance of informal and non-standa...
Social media language contains huge amount and wide variety of nonstandard tokens, cre-ated both int...
is one of the most important data sources in social data analysis. However, the text contained on Tw...
The language used in social media is often characterized by the abundance of informal and non-standa...
Text normalization is an indispensable stage for natural language processing of social media data wi...
One of the main characteristics of social media data is the use of non-standard language. Since NLP ...
One of the major problems in the era of big data use is how to ‘clean’ the vast amount of data on th...
Social media such as Twitter, Reddit, and Facebook, have become de facto global communication channe...
Geographical location is vital to geospatial applications like local search and event detection. In ...
This is an accepted manuscript of an article published by IEEE in 2018 3rd International Conference ...
Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and oth...
Inferring the location of a user has been a valuable step for many applications that leverage social...
The ever-growing usage of social media platforms generates daily vast amounts of textual data which ...
Social media texts have become one of the most used forms of written language and a valuable source ...
As social media constitute a valuable source for data analysis for a wide range of applications, the...
The language used in social media is often characterized by the abundance of informal and non-standa...