The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0.7): The clean version has 5.1K type-checked Python repositories and 1.2M type annotations. The complete version has 5.2K Python repositories and 3.3M type annotations. The dataset's source files are type-checked using mypy (clean version). The dataset is also de-duplicated using the CD4Py tool. Check out the README.MD file for the description of the dataset. Notable changes to each version of the dataset are documented in CHANGELOG.md. The dataset's scripts and utilities are available on its GitHub repository
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
Maintaining large code bases written in dynamically typed languages, such as JavaScript or Python, c...
A database of many different types of multivariate time series, each with between 5-25 processes and...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has more than 5.2K Python repositories an...
In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type ...
This contains artifacts for the Type4Py paper, which is accepted at the ICSE'22 technical track. ...
A new version of the ManyTypes4Py dataset could be used to train and evaluate the TypePy model For ...
This dataset contains python repositories mined on GitHub on January 20, 2021. It allows a cross-dom...
Researchers at the Delft University of Technology have developed Type4Py: a tool that uses Machine L...
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility ...
Optional type annotations allow for enriching dynamic programming languages with static typing featu...
We present Typpete, a sound type inferencer that automatically infers Python 3 type annotations. Typ...
This patch fixes a regression in Hypothesis 6.14.8, where "from_type()" failed to resolve types whic...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
Maintaining large code bases written in dynamically typed languages, such as JavaScript or Python, c...
A database of many different types of multivariate time series, each with between 5-25 processes and...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has more than 5.2K Python repositories an...
In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type ...
This contains artifacts for the Type4Py paper, which is accepted at the ICSE'22 technical track. ...
A new version of the ManyTypes4Py dataset could be used to train and evaluate the TypePy model For ...
This dataset contains python repositories mined on GitHub on January 20, 2021. It allows a cross-dom...
Researchers at the Delft University of Technology have developed Type4Py: a tool that uses Machine L...
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility ...
Optional type annotations allow for enriching dynamic programming languages with static typing featu...
We present Typpete, a sound type inferencer that automatically infers Python 3 type annotations. Typ...
This patch fixes a regression in Hypothesis 6.14.8, where "from_type()" failed to resolve types whic...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
Maintaining large code bases written in dynamically typed languages, such as JavaScript or Python, c...
A database of many different types of multivariate time series, each with between 5-25 processes and...