In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating machine-learning models for sequence-based type inference in TypeScript. The dataset includes over 9 million type annotations, across 13,953 projects and 539,571 files. The dataset is approximately 10x larger than analogous type inference datasets for Python, and is the largest available for TypeScript. We also provide API access to the dataset, which can be integrated into any tokenizer and used with any state-of-the-art sequence-based model. Finally, we provide analysis and performance results for state-of-the-art code-specific models, for baselining. ManyTypes4TypeScript is available on Huggingface and Zenodo. This dataset was collected on Ja...
Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility ...
This dataset contains python repositories mined on GitHub on January 20, 2021. It allows a cross-dom...
This dataset is intended to accompany the paper "Designing Types for R, Empirically" (@ OOPSLA'20, l...
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type ...
Check out the file ManyTypes4PyDataset.spec for repositories URL and their commit SHA. The dataset i...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0....
Dynamic languages, such as Python and Javascript, trade static typing for developer flexibility and ...
A new version of the ManyTypes4Py dataset could be used to train and evaluate the TypePy model The ...
Abstract Type migration is the process of adding types to untyped code to gain assurance at compile...
Dynamically typed languages lack information about the types of variables in the source code. Develo...
Optional type annotations allow for enriching dynamic programming languages with static typing featu...
Maintaining large code bases written in dynamically typed languages, such as JavaScript or Python, c...
Type inference is a key component of modern statically typed programming languages. It allows progra...
Type feedback and type inference are two common methods used to optimize dynamic languages such as J...
Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility ...
This dataset contains python repositories mined on GitHub on January 20, 2021. It allows a cross-dom...
This dataset is intended to accompany the paper "Designing Types for R, Empirically" (@ OOPSLA'20, l...
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type ...
Check out the file ManyTypes4PyDataset.spec for repositories URL and their commit SHA. The dataset i...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0....
Dynamic languages, such as Python and Javascript, trade static typing for developer flexibility and ...
A new version of the ManyTypes4Py dataset could be used to train and evaluate the TypePy model The ...
Abstract Type migration is the process of adding types to untyped code to gain assurance at compile...
Dynamically typed languages lack information about the types of variables in the source code. Develo...
Optional type annotations allow for enriching dynamic programming languages with static typing featu...
Maintaining large code bases written in dynamically typed languages, such as JavaScript or Python, c...
Type inference is a key component of modern statically typed programming languages. It allows progra...
Type feedback and type inference are two common methods used to optimize dynamic languages such as J...
Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility ...
This dataset contains python repositories mined on GitHub on January 20, 2021. It allows a cross-dom...
This dataset is intended to accompany the paper "Designing Types for R, Empirically" (@ OOPSLA'20, l...