Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion approaches. We published the CT3-enhanced dataset with pre-computed token types for each token in the Python150k dataset. The dataset was obtained from an empirical study of the below paper: Kim Tuyen Le, Gabriel Rashidi, and Artur Andrzejak. A Methodology for Refined Evaluation of ML-based Code Completion Approaches. In Special Issue on Programming Language Processing, Data Mining and Knowledge Discovery. Please read the README.txt file for detailed information of structuring the enhanced dataset
Context: Code completion aims to help improve developers’ productivity by suggesting the next code t...
This repository contains prompts generated by CodeIPPrompt, a platform used to assess potential inte...
The archive file contains the following materials: `repos.tar.gz`: dataset of the collected Pytho...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type ...
Empirical data and code used in paper *Towards a Large-Scale Empirical Study of Python3 Type Annotat...
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
This repository contains the dataset of the manuscript: "An Empirical Study on the Usage and Availa...
The dataset of Python projects used for the study of code change patterns and their automation. The ...
The dataset is gathered on Sep. 17th 2020. It has more than 5.4K Python repositories that are hosted...
This is material for a work under review. If used, please cite accordingly. This is a reproduction ...
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
This object contains the dataset and python code used for the paper: S. Brenner and R. Sablatnig. O...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0....
These are the token-based data sets used for my PhD thesis ("Potentiale syntaktischer Annotationen f...
Context: Code completion aims to help improve developers’ productivity by suggesting the next code t...
This repository contains prompts generated by CodeIPPrompt, a platform used to assess potential inte...
The archive file contains the following materials: `repos.tar.gz`: dataset of the collected Pytho...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type ...
Empirical data and code used in paper *Towards a Large-Scale Empirical Study of Python3 Type Annotat...
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
This repository contains the dataset of the manuscript: "An Empirical Study on the Usage and Availa...
The dataset of Python projects used for the study of code change patterns and their automation. The ...
The dataset is gathered on Sep. 17th 2020. It has more than 5.4K Python repositories that are hosted...
This is material for a work under review. If used, please cite accordingly. This is a reproduction ...
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
This object contains the dataset and python code used for the paper: S. Brenner and R. Sablatnig. O...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0....
These are the token-based data sets used for my PhD thesis ("Potentiale syntaktischer Annotationen f...
Context: Code completion aims to help improve developers’ productivity by suggesting the next code t...
This repository contains prompts generated by CodeIPPrompt, a platform used to assess potential inte...
The archive file contains the following materials: `repos.tar.gz`: dataset of the collected Pytho...