Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion approaches. We published the CT3-enhanced dataset with pre-computed token types for each token in the Python150k dataset. The dataset was obtained from an empirical study of the below paper: Kim Tuyen Le, Gabriel Rashidi, and Artur Andrzejak. A Methodology for Refined Evaluation of ML-based Code Completion Approaches. In KDD Workshop on Programming Language Processing (PLP), August 14-18, 2021 (Virtual). Please read the README.txt file for detailed information of structuring the enhanced dataset
This repository contains prompts generated by CodeIPPrompt, a platform used to assess potential inte...
These are the token-based data sets used for my PhD thesis ("Potentiale syntaktischer Annotationen f...
Code accompanying the paper First-Class Data Types in Shallow Embedded Domain-Specific Languages usi...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type ...
Empirical data and code used in paper *Towards a Large-Scale Empirical Study of Python3 Type Annotat...
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
The dataset is gathered on Sep. 17th 2020. It has more than 5.4K Python repositories that are hosted...
The dataset of Python projects used for the study of code change patterns and their automation. The ...
This repository contains the dataset of the manuscript: "An Empirical Study on the Usage and Availa...
This is material for a work under review. If used, please cite accordingly. This is a reproduction ...
Context: Code completion aims to help improve developers’ productivity by suggesting the next code t...
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
This object contains the dataset and python code used for the paper: S. Brenner and R. Sablatnig. O...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0....
This repository contains prompts generated by CodeIPPrompt, a platform used to assess potential inte...
These are the token-based data sets used for my PhD thesis ("Potentiale syntaktischer Annotationen f...
Code accompanying the paper First-Class Data Types in Shallow Embedded Domain-Specific Languages usi...
Code Token Type Taxonomy (CT3) is a methodology for refined evaluation of ML-based code completion a...
In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type ...
Empirical data and code used in paper *Towards a Large-Scale Empirical Study of Python3 Type Annotat...
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python cod...
The dataset is gathered on Sep. 17th 2020. It has more than 5.4K Python repositories that are hosted...
The dataset of Python projects used for the study of code change patterns and their automation. The ...
This repository contains the dataset of the manuscript: "An Empirical Study on the Usage and Availa...
This is material for a work under review. If used, please cite accordingly. This is a reproduction ...
Context: Code completion aims to help improve developers’ productivity by suggesting the next code t...
In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating mach...
This object contains the dataset and python code used for the paper: S. Brenner and R. Sablatnig. O...
The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0....
This repository contains prompts generated by CodeIPPrompt, a platform used to assess potential inte...
These are the token-based data sets used for my PhD thesis ("Potentiale syntaktischer Annotationen f...
Code accompanying the paper First-Class Data Types in Shallow Embedded Domain-Specific Languages usi...