Obtaining a relevant dataset is central to conducting empirical studies in software engineering. However, in the context of mining software repositories, the lack of appropriate tooling for large scale mining tasks hinders the creation of new datasets. Moreover, limitations related to data sources that change over time (e.g., code bases) and the lack of documentation of extraction processes make it difficult to reproduce datasets over time. This threatens the quality and reproducibility of empirical studies. In this paper, we propose a tool-supported approach facilitating the creation of large tailored datasets while ensuring their reproducibility. We leveraged all the sources feeding the Software Heritage append-only archive which are acce...
Software repositories have been getting a lot of attention from researchers in recent years. In orde...
Short: This data set contains three snapshots from an evolving software project data set and a scrip...
International audienceProgram understanding aims at discovering human-readable properties of a softw...
The need for automated software engineering tools and tech-niques continues to grow as the size and ...
Background After many years of research on software repositories, the knowledge for building mature,...
Software repositories contain a vast wealth of information about software development. Mining these ...
Background: Software repositories provide large amount of data encompassing software changes through...
We present a software framework for mining software repositories. Our extensible framework enables t...
Abstract—This paper is the result of reviewing all papers published in the proceedings of the former...
Software Heritage is the largest existing public archive of software source code and accompanying de...
Software metrics are a useful tool for assessing software quality and for making predictions. But cu...
This is the excel spreadsheet dataset containing our analysis of papers performing mining software r...
The primary goal of software development is to deliver Optimal Software, i.e., software produced at...
Software repositories have been getting a lot of attention from researchers in recent years. In orde...
The software files, such as the version repositories of errors, stored much of the activity related ...
Software repositories have been getting a lot of attention from researchers in recent years. In orde...
Short: This data set contains three snapshots from an evolving software project data set and a scrip...
International audienceProgram understanding aims at discovering human-readable properties of a softw...
The need for automated software engineering tools and tech-niques continues to grow as the size and ...
Background After many years of research on software repositories, the knowledge for building mature,...
Software repositories contain a vast wealth of information about software development. Mining these ...
Background: Software repositories provide large amount of data encompassing software changes through...
We present a software framework for mining software repositories. Our extensible framework enables t...
Abstract—This paper is the result of reviewing all papers published in the proceedings of the former...
Software Heritage is the largest existing public archive of software source code and accompanying de...
Software metrics are a useful tool for assessing software quality and for making predictions. But cu...
This is the excel spreadsheet dataset containing our analysis of papers performing mining software r...
The primary goal of software development is to deliver Optimal Software, i.e., software produced at...
Software repositories have been getting a lot of attention from researchers in recent years. In orde...
The software files, such as the version repositories of errors, stored much of the activity related ...
Software repositories have been getting a lot of attention from researchers in recent years. In orde...
Short: This data set contains three snapshots from an evolving software project data set and a scrip...
International audienceProgram understanding aims at discovering human-readable properties of a softw...