In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type inference. The dataset contains a total of 5, 382 Python projects with more than 869K type annotations. Duplicate source code files were removed to eliminate the negative effect of the duplication bias. To facilitate training and evaluation of ML models, the dataset was split into training, validation and test sets by files. To extract type information from abstract syntax trees (ASTs), a light-weight static analyzer pipeline is developed and accompanied with the dataset. Using this pipeline, the collected Python projects were analyzed and the results of the AST analysis were stored in JSON-formatted files. The ManyTypes4Py dataset is shared ...
Maintaining large code bases written in dynamically typed languages, such as JavaScript or Python, c...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
An intelligent tool for type annotations in Python would increase the productivity of developers. Py...
Check out the file ManyTypes4PyDataset.spec for repositories URL and their commit SHA. The dataset i...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0....
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
Dynamic languages, such as Python and Javascript, trade static typing for developer flexibility and ...
This dataset contains python repositories mined on GitHub on January 20, 2021. It allows a cross-dom...
Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility ...
Researchers at the Delft University of Technology have developed Type4Py: a tool that uses Machine L...
A new version of the ManyTypes4Py dataset could be used to train and evaluate the TypePy model For ...
Optional type annotations allow for enriching dynamic programming languages with static typing featu...
In dynamically typed programming languages, values have types, but vari-ables and other constructs i...
We present Typpete, a sound type inferencer that automatically infers Python 3 type annotations. Typ...
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
Maintaining large code bases written in dynamically typed languages, such as JavaScript or Python, c...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
An intelligent tool for type annotations in Python would increase the productivity of developers. Py...
Check out the file ManyTypes4PyDataset.spec for repositories URL and their commit SHA. The dataset i...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0....
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
Dynamic languages, such as Python and Javascript, trade static typing for developer flexibility and ...
This dataset contains python repositories mined on GitHub on January 20, 2021. It allows a cross-dom...
Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility ...
Researchers at the Delft University of Technology have developed Type4Py: a tool that uses Machine L...
A new version of the ManyTypes4Py dataset could be used to train and evaluate the TypePy model For ...
Optional type annotations allow for enriching dynamic programming languages with static typing featu...
In dynamically typed programming languages, values have types, but vari-ables and other constructs i...
We present Typpete, a sound type inferencer that automatically infers Python 3 type annotations. Typ...
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
Maintaining large code bases written in dynamically typed languages, such as JavaScript or Python, c...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
An intelligent tool for type annotations in Python would increase the productivity of developers. Py...