LEARNING PROGRAM EMBEDDING FROM UNLABELED SOURCE CODE

Ahmed, Toufique

Publication date

January 2023

Publisher

eScholarship, University of California

Abstract

Machine-learning models can reach very high performance with supervised training, where they learn from labeled data. However, supervised training requires annotating data with desired output labels, which can be a difficult and time-consuming task. Meanwhile, advancements in deep learning models and technology have made it possible to train very large models, which was not feasible a few years ago. Although training such big models requires a substantial amount of supervised data, models can overcome this limitation by first learning from un-labeled data. Pre-trained language models enable us to achieve state-of-the-art performance from large-scale models with limited supervised data. During the pre-training stage, models are exposed to un...

Extracted data

We use cookies to provide a better user experience.

Data Protection

LEARNING PROGRAM EMBEDDING FROM UNLABELED SOURCE CODE

Abstract

Extracted data

LEARNING PROGRAM EMBEDDING FROM UNLABELED SOURCE CODE

Abstract

Extracted data

Related items

Related items