peer-reviewedContent analysis is a useful approach for analyzing unstructured software project data, but it is labor-intensive and slow. Can automated text classification (using supervised machine learning) be used to reduce the labor or improve the speed of content analysis? We conducted a case study involving data from a previous study that employed content analysis of an open source software project. We used a human-coded data set with 3256 samples to create different size training sets ranging in size from 100 to 3000 samples to train an “ensemble” text classifier to assign one of five different categories to a test set of samples. The results show that the automated classifier could be trained to recognize categories, but m...
The increasing availability of digitized text presents enormous opportunities for social scientists....
Analysts have estimated that more than 80 percent of today’s data is stored in unstructured form (e....
Context: Automated classifiers, often based on machine learning (ML), are increasingly used in softw...
In the paper, the authors are presenting the outcome of web scraping software allowing for the autom...
Text is becoming a central source of data for social science research. With advances in digitization...
Purpose: The authors aim at testing the performance of a set of machine learning algorithms that cou...
Text mining is drawing enormous attention in this era as there is a huge amount of text data getting...
The automated categorization (or classification) of texts into predefined categories has witnessed a...
Technology Watch human agents have to read many documents in order to manually categorize and dispat...
A mapping between a system's implementation and its software architecture is mandatory in many archi...
Text classification via supervised learning involves various steps from processing raw data, featur...
Extracting meaningful information from large collections of text data is problematic because of the ...
There are several reasons why one would want an au-tomated system for content analysis of text. It i...
Many applications in text processing require significant human effort for either labeling large docu...
Purpose: The authors aim at testing the performance of a set of machine learning algorithms that cou...
The increasing availability of digitized text presents enormous opportunities for social scientists....
Analysts have estimated that more than 80 percent of today’s data is stored in unstructured form (e....
Context: Automated classifiers, often based on machine learning (ML), are increasingly used in softw...
In the paper, the authors are presenting the outcome of web scraping software allowing for the autom...
Text is becoming a central source of data for social science research. With advances in digitization...
Purpose: The authors aim at testing the performance of a set of machine learning algorithms that cou...
Text mining is drawing enormous attention in this era as there is a huge amount of text data getting...
The automated categorization (or classification) of texts into predefined categories has witnessed a...
Technology Watch human agents have to read many documents in order to manually categorize and dispat...
A mapping between a system's implementation and its software architecture is mandatory in many archi...
Text classification via supervised learning involves various steps from processing raw data, featur...
Extracting meaningful information from large collections of text data is problematic because of the ...
There are several reasons why one would want an au-tomated system for content analysis of text. It i...
Many applications in text processing require significant human effort for either labeling large docu...
Purpose: The authors aim at testing the performance of a set of machine learning algorithms that cou...
The increasing availability of digitized text presents enormous opportunities for social scientists....
Analysts have estimated that more than 80 percent of today’s data is stored in unstructured form (e....
Context: Automated classifiers, often based on machine learning (ML), are increasingly used in softw...