Recent research shows that language models, such as n-gram models, are useful at a wide variety of software engineering tasks, e.g., code completion, bug identification, code summarisation, etc. However, such models require the appropriate set of numerous parameters. Moreover, the different ways one can read code essentially yield different models (based on the different sequences of tokens). In this paper, we focus on n- gram models and evaluate how the use of tokenizers, smoothing, unknown threshold and n values impact the predicting ability of these models. Thus, we compare the use of multiple tokenizers and sets of different parameters (smoothing, unknown threshold and n values) with the aim of identifying the most appropriate combinati...
The probing of software by security testers to detect possible vulnerabilities is of primary importa...
In this paper, an extension of n-grams, called x-grams, is proposed. In this extension, the memory o...
Context: Identifying defects in code early is important. A wide range of static code metrics have be...
Recent research shows that language models, such as n-gram models, are useful at a wide variety of s...
peer reviewedNatural language processing techniques, in particular n-gram models, have been applied ...
We live in a time where software is used everywhere. It is used even for creating other software by ...
Analyzing source code using computational linguistics and exploiting the linguistic properties of so...
We present a tutorial introduction to n-gram models for language modeling and survey the most widely...
The recent availability of large corpora for training N-gram language models has shown the utility o...
Abstract—Natural languages like English are rich, complex, and powerful. The highly creative and gra...
It seems obvious that a successful model of natural language would incorporate a great deal of both ...
The smoothing of n-gram models is a core technique in lan-guage modelling (LM). Modified Kneser-Ney ...
This paper systematically investigates the generation of code explanations by Large Language Models ...
N-grams have had a great impact on the state of the art in natural language parsing. They are centra...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) i...
The probing of software by security testers to detect possible vulnerabilities is of primary importa...
In this paper, an extension of n-grams, called x-grams, is proposed. In this extension, the memory o...
Context: Identifying defects in code early is important. A wide range of static code metrics have be...
Recent research shows that language models, such as n-gram models, are useful at a wide variety of s...
peer reviewedNatural language processing techniques, in particular n-gram models, have been applied ...
We live in a time where software is used everywhere. It is used even for creating other software by ...
Analyzing source code using computational linguistics and exploiting the linguistic properties of so...
We present a tutorial introduction to n-gram models for language modeling and survey the most widely...
The recent availability of large corpora for training N-gram language models has shown the utility o...
Abstract—Natural languages like English are rich, complex, and powerful. The highly creative and gra...
It seems obvious that a successful model of natural language would incorporate a great deal of both ...
The smoothing of n-gram models is a core technique in lan-guage modelling (LM). Modified Kneser-Ney ...
This paper systematically investigates the generation of code explanations by Large Language Models ...
N-grams have had a great impact on the state of the art in natural language parsing. They are centra...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) i...
The probing of software by security testers to detect possible vulnerabilities is of primary importa...
In this paper, an extension of n-grams, called x-grams, is proposed. In this extension, the memory o...
Context: Identifying defects in code early is important. A wide range of static code metrics have be...