Large Language Models (LLMs) have demonstrated strong natural language processing and code synthesis capabilities, which has led to their rapid adoption in software engineering applications. However, details about LLM training data are often not made public, which has caused concern as to whether existing bug benchmarks are included. In lieu of the training data for the popular GPT models, we examine the training data of the open-source LLM StarCoder, and find it likely that data from the widely used Defects4J benchmark was included, raising the possibility of its inclusion in GPT training data as well. This makes it difficult to tell how well LLM-based results on Defects4J would generalize, as for any results it would be unclear whether a ...
Bug prediction is aimed at identifying software artifacts that are more likely to be defective in th...
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (A...
The number of research papers on defect prediction has sharply increased for the last decade or so. ...
Large language models (LLMs) have recently been integrated in a variety of applications including so...
As machine learning tools progress, the inevitable question arises: How can machine learning help us...
For software testing research, Defects4J stands out as the primary benchmark dataset, offering a con...
The application of machine learning (ML) and natural language processing (NLP) methods for creating...
Large language models (LLMs) have demonstrated significant potential in the realm of natural languag...
International audienceTesting is a pivotal activity in ensuring the quality of software. Code covera...
Lately, Large Language Models have been widely used in code generation. GPT4 is considered the most ...
Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising perfo...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...
One of the most common solutions adopted by software researchers to address code generation is by tr...
This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engin...
One of the critical phases in software development is software testing. Testing helps with identifyi...
Bug prediction is aimed at identifying software artifacts that are more likely to be defective in th...
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (A...
The number of research papers on defect prediction has sharply increased for the last decade or so. ...
Large language models (LLMs) have recently been integrated in a variety of applications including so...
As machine learning tools progress, the inevitable question arises: How can machine learning help us...
For software testing research, Defects4J stands out as the primary benchmark dataset, offering a con...
The application of machine learning (ML) and natural language processing (NLP) methods for creating...
Large language models (LLMs) have demonstrated significant potential in the realm of natural languag...
International audienceTesting is a pivotal activity in ensuring the quality of software. Code covera...
Lately, Large Language Models have been widely used in code generation. GPT4 is considered the most ...
Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising perfo...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...
One of the most common solutions adopted by software researchers to address code generation is by tr...
This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engin...
One of the critical phases in software development is software testing. Testing helps with identifyi...
Bug prediction is aimed at identifying software artifacts that are more likely to be defective in th...
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (A...
The number of research papers on defect prediction has sharply increased for the last decade or so. ...