Check out the file ManyTypes4PyDataset.spec for repositories URL and their commit SHA. The dataset is gathered on Sep. 17th 2020. The dataset has more 5.4K Python repositories that are hosted on GitHub. It contains more than 1.1M type annotations. Please note that this is the first version of the dataset. In the second version, we will provide processed Python projects in JSON files that contain relevant features and hints for ML-based type inference task
Empirical data and code used in paper *Towards a Large-Scale Empirical Study of Python3 Type Annotat...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
This repository contains the dataset of the manuscript: "An Empirical Study on the Usage and Availa...
The dataset is gathered on Sep. 17th 2020. It has more than 5.4K Python repositories that are hosted...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0....
In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type ...
This contains artifacts for the Type4Py paper, which is accepted at the ICSE'22 technical track. ...
This dataset contains python repositories mined on GitHub on January 20, 2021. It allows a cross-dom...
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
A new version of the ManyTypes4Py dataset could be used to train and evaluate the TypePy model For ...
Researchers at the Delft University of Technology have developed Type4Py: a tool that uses Machine L...
Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility ...
Optional type annotations allow for enriching dynamic programming languages with static typing featu...
We present Typpete, a sound type inferencer that automatically infers Python 3 type annotations. Typ...
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
Empirical data and code used in paper *Towards a Large-Scale Empirical Study of Python3 Type Annotat...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
This repository contains the dataset of the manuscript: "An Empirical Study on the Usage and Availa...
The dataset is gathered on Sep. 17th 2020. It has more than 5.4K Python repositories that are hosted...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0....
In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type ...
This contains artifacts for the Type4Py paper, which is accepted at the ICSE'22 technical track. ...
This dataset contains python repositories mined on GitHub on January 20, 2021. It allows a cross-dom...
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
A new version of the ManyTypes4Py dataset could be used to train and evaluate the TypePy model For ...
Researchers at the Delft University of Technology have developed Type4Py: a tool that uses Machine L...
Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility ...
Optional type annotations allow for enriching dynamic programming languages with static typing featu...
We present Typpete, a sound type inferencer that automatically infers Python 3 type annotations. Typ...
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
Empirical data and code used in paper *Towards a Large-Scale Empirical Study of Python3 Type Annotat...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
This repository contains the dataset of the manuscript: "An Empirical Study on the Usage and Availa...