ABSTRACT. Social scientists interested in mixed-methods research have traditionally turned to human annotators to classify the documents or events used in their analyses. The rapid growth of digi-tized government documents in recent years presents new opportunities for research but also new chal-lenges. With more and more data coming online, relying on human annotators becomes prohibitively expensive for many tasks. For researchers interested in saving time and money while maintaining confi-dence in their results, we show how a particular supervised learning system can provide estimates of the class of each document (or event). This system maintains high classification accuracy and provides accu-rate estimates of document proportions, while...
We describe a new family of topic-ranking algorithms for multi-labeled documents. The motivation for...
Automatic document classification techniques have been widely advocated for the study of various fi...
It is estimated that the world’s data will increase to roughly 160 billion terabytes by 2025, with m...
Text is becoming a central source of data for social science research. With advances in digitization...
The increasing availability of digitized text presents enormous opportunities for social scientists....
Since 1995 the techniques and capacities to store new electronic data and to make it available to ma...
This paper describes an experiment in applying standard supervised machine learning algorithms (C4.5...
As the dramatic expansion of online publications continues, state libraries urgently need effective ...
With the exponential growth of scholarly data during the past few years, effective methods for topic...
Due in large part to the proliferation of digitized text, much of it available for little or no cost...
While automated methods for information organization have been around for several decades now, expon...
Topic indexing is the task of identifying the main topics covered by a document. These are useful fo...
The article addresses the problem of document classification. A technology for automatic topic extra...
The outcomes of both experiments suggest that topics derived from purely textual data implicitly cap...
Social scientists often classify text documents to use the resulting labels as an outcome or a predi...
We describe a new family of topic-ranking algorithms for multi-labeled documents. The motivation for...
Automatic document classification techniques have been widely advocated for the study of various fi...
It is estimated that the world’s data will increase to roughly 160 billion terabytes by 2025, with m...
Text is becoming a central source of data for social science research. With advances in digitization...
The increasing availability of digitized text presents enormous opportunities for social scientists....
Since 1995 the techniques and capacities to store new electronic data and to make it available to ma...
This paper describes an experiment in applying standard supervised machine learning algorithms (C4.5...
As the dramatic expansion of online publications continues, state libraries urgently need effective ...
With the exponential growth of scholarly data during the past few years, effective methods for topic...
Due in large part to the proliferation of digitized text, much of it available for little or no cost...
While automated methods for information organization have been around for several decades now, expon...
Topic indexing is the task of identifying the main topics covered by a document. These are useful fo...
The article addresses the problem of document classification. A technology for automatic topic extra...
The outcomes of both experiments suggest that topics derived from purely textual data implicitly cap...
Social scientists often classify text documents to use the resulting labels as an outcome or a predi...
We describe a new family of topic-ranking algorithms for multi-labeled documents. The motivation for...
Automatic document classification techniques have been widely advocated for the study of various fi...
It is estimated that the world’s data will increase to roughly 160 billion terabytes by 2025, with m...