In this poster we describe a pilot study of searching social science literature for legacy corpora to evaluate text mining algorithms. The new emerging field of computational social science demands large amount of social science data to train and evaluate computational models. We argue that the legacy corpora that were annotated by social science researchers through traditional Qualitative Data Analysis (QDA) are ideal data sets to evaluate text mining methods, such as text categorization and clustering. As a pilot study, we searched articles that involve content analysis and discourse analysis in leading communication journals, and then contacted the authors regarding the availability of the annotated texts. Regretfully, nearly all of the ...
Text mining has developed rapidly in recent years. In this article we compare classification methods...
Helge-Johannes Marahrens is a fourth year doctoral student in the department of Sociology at Indiana...
Text data mining (TDM) is the computational and statistical analysis of large corpora of texts. Ofte...
The emergence of big data and computational tools has introduced new possibilities for using large-s...
Current text mining applications statistically work on the basis of linguistic models and theories a...
Natural language corpora are phenomenally rich resources for learning about people and society, and ...
Text, the written representation of human thought and communication in natural language, has been a ...
We identify three gaps that limit the utility and obstruct the progress of computational text analys...
Two developments in computational text analysis may change the way qualitative data analysis in soci...
We explore the use of natural language processing technologies to assist in content and communicatio...
This article introduces a process for computational text classification that can be used in a variet...
"Two developments in computational text analysis may change the way qualitative data analysis in soc...
In support of data-intensive research and inquiry, data curation has been recognized as an emerging ...
Using text mining and visualization techniques to identify the topical coverage of text corpora is i...
The increasing availability of digitized text presents enormous opportunities for social scientists....
Text mining has developed rapidly in recent years. In this article we compare classification methods...
Helge-Johannes Marahrens is a fourth year doctoral student in the department of Sociology at Indiana...
Text data mining (TDM) is the computational and statistical analysis of large corpora of texts. Ofte...
The emergence of big data and computational tools has introduced new possibilities for using large-s...
Current text mining applications statistically work on the basis of linguistic models and theories a...
Natural language corpora are phenomenally rich resources for learning about people and society, and ...
Text, the written representation of human thought and communication in natural language, has been a ...
We identify three gaps that limit the utility and obstruct the progress of computational text analys...
Two developments in computational text analysis may change the way qualitative data analysis in soci...
We explore the use of natural language processing technologies to assist in content and communicatio...
This article introduces a process for computational text classification that can be used in a variet...
"Two developments in computational text analysis may change the way qualitative data analysis in soc...
In support of data-intensive research and inquiry, data curation has been recognized as an emerging ...
Using text mining and visualization techniques to identify the topical coverage of text corpora is i...
The increasing availability of digitized text presents enormous opportunities for social scientists....
Text mining has developed rapidly in recent years. In this article we compare classification methods...
Helge-Johannes Marahrens is a fourth year doctoral student in the department of Sociology at Indiana...
Text data mining (TDM) is the computational and statistical analysis of large corpora of texts. Ofte...