One of the most common solutions adopted by software researchers to address code generation is by training Large Language Models (LLMs) on massive amounts of source code. Although a number of studies have shown that LLMs have been effectively evaluated on popular accuracy metrics (e.g., BLEU, CodeBleu), previous research has largely overlooked the role of Causal Inference as a fundamental component of the interpretability of LLMs' performance. Existing benchmarks and datasets are meant to highlight the difference between the expected and the generated outcome, but do not take into account confounding variables (e.g., lines of code, prompt size) that equally influence the accuracy metrics. The fact remains that, when dealing with generative ...
The increasing popularity of large language models (LLMs) has paved the way for their application in...
Large language models (LLMs) have demonstrated significant potential in the realm of natural languag...
Software development is an inherently collaborative process, where various stakeholders frequently e...
The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recen...
There is abundant observational data in the software engineering domain, whereas running large-scale...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...
Large Language Models (LLMs) for code are a family of high-parameter, transformer-based neural netwo...
In this paper, we uncover a systematic bias in the evaluation paradigm of adopting large language mo...
As large language models (LLMs) continue to advance, accurately and comprehensively evaluating their...
Large Language Models (LLMs), such as GPT and BERT, have demonstrated remarkable capabilities in add...
Large Language Models (LLMs) have demonstrated strong natural language processing and code synthesis...
We present an empirical evaluation of various outputs generated by nine of the most widely-available...
Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising perfo...
Some argue scale is all what is needed to achieve AI, covering even causal models. We make it clear ...
Recently, scores of high-performing code generation systems have surfaced. As has become a popular c...
The increasing popularity of large language models (LLMs) has paved the way for their application in...
Large language models (LLMs) have demonstrated significant potential in the realm of natural languag...
Software development is an inherently collaborative process, where various stakeholders frequently e...
The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recen...
There is abundant observational data in the software engineering domain, whereas running large-scale...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...
Large Language Models (LLMs) for code are a family of high-parameter, transformer-based neural netwo...
In this paper, we uncover a systematic bias in the evaluation paradigm of adopting large language mo...
As large language models (LLMs) continue to advance, accurately and comprehensively evaluating their...
Large Language Models (LLMs), such as GPT and BERT, have demonstrated remarkable capabilities in add...
Large Language Models (LLMs) have demonstrated strong natural language processing and code synthesis...
We present an empirical evaluation of various outputs generated by nine of the most widely-available...
Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising perfo...
Some argue scale is all what is needed to achieve AI, covering even causal models. We make it clear ...
Recently, scores of high-performing code generation systems have surfaced. As has become a popular c...
The increasing popularity of large language models (LLMs) has paved the way for their application in...
Large language models (LLMs) have demonstrated significant potential in the realm of natural languag...
Software development is an inherently collaborative process, where various stakeholders frequently e...