Despite the recent trend of developing and applying neural source code models to software engineering tasks, the quality of such models is insufficient for real-world use. This is because there could be noise in the source code corpora used to train such models. We adapt data-influence methods to detect such noises in this paper. Data-influence methods are used in machine learning to evaluate the similarity of a target sample to the correct samples in order to determine whether or not the target sample is noisy. Our evaluation results show that data-influence methods can identify noisy samples from neural code models in classification-based tasks. This approach will contribute to the larger vision of developing better neural source code mod...
Thesis (Ph.D.)--University of Washington, 2020Modern machine learning algorithms have been able to a...
This thesis focuses on the aspect of label noise for real-life datasets. Due to the upcoming growing...
Code smells can compromise software quality in the long term by inducing technical debt. For this re...
Deep Neural Networks (DNNs) are increasingly being used in software engineering and code intelligenc...
International audienceIn this paper, we tackle the problem of finding potentially problematic sample...
The ability to identify influential training examples enables us to debug training data and explain ...
source code and dataset for paper titled "Severity classification of software code smells using mach...
The advancements in machine learning techniques have encouraged researchers to apply these technique...
peer reviewedSoftware engineers are increasingly asked to build datasets for engineering neural netw...
Programmatic Weak Supervision (PWS) aggregates the source votes of multiple weak supervision sources...
Code smells can compromise software quality in the long term by inducing technical debt. For this re...
The popularity of machine learning has wildly expanded in recent years. Machine learning techniques ...
In recent years, the field of language modelling has witnessed exciting developments. Especially, th...
Recent improvements in machine learning methods have significantly advanced many fields in- cluding ...
Machine learning is used increasingly frequent in software engineering to automate tasks and improve...
Thesis (Ph.D.)--University of Washington, 2020Modern machine learning algorithms have been able to a...
This thesis focuses on the aspect of label noise for real-life datasets. Due to the upcoming growing...
Code smells can compromise software quality in the long term by inducing technical debt. For this re...
Deep Neural Networks (DNNs) are increasingly being used in software engineering and code intelligenc...
International audienceIn this paper, we tackle the problem of finding potentially problematic sample...
The ability to identify influential training examples enables us to debug training data and explain ...
source code and dataset for paper titled "Severity classification of software code smells using mach...
The advancements in machine learning techniques have encouraged researchers to apply these technique...
peer reviewedSoftware engineers are increasingly asked to build datasets for engineering neural netw...
Programmatic Weak Supervision (PWS) aggregates the source votes of multiple weak supervision sources...
Code smells can compromise software quality in the long term by inducing technical debt. For this re...
The popularity of machine learning has wildly expanded in recent years. Machine learning techniques ...
In recent years, the field of language modelling has witnessed exciting developments. Especially, th...
Recent improvements in machine learning methods have significantly advanced many fields in- cluding ...
Machine learning is used increasingly frequent in software engineering to automate tasks and improve...
Thesis (Ph.D.)--University of Washington, 2020Modern machine learning algorithms have been able to a...
This thesis focuses on the aspect of label noise for real-life datasets. Due to the upcoming growing...
Code smells can compromise software quality in the long term by inducing technical debt. For this re...