Analyzing source code using computational linguistics and exploiting the linguistic properties of source code have recently become popular topics in the domain of software engineering. In the first part of the thesis, we study the predictability of source code and determine how well source code can be represented using language models developed for natural language processing. In the second part, we study how well English discussions of source code can be aligned with code elements to create parallel corpora for English-to-code statistical machine translation. This work is organized as a “manuscript” thesis whereby each core chapter constitutes a submitted paper. The first part replicates recent works that have concluded that software is m...
Source code is a form of human communication, albeit one where the information shared between the pr...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
Large repositories of source code create new challenges and opportunities for statistical machine le...
Abstract—Natural languages like English are rich, complex, and powerful. The highly creative and gra...
Statistical Machine Translation (SMT) has gained enormous popularity in recent years as natural lan...
Research at the intersection of machine learning, programming languages, and software engineering ha...
“Naturalness ” of Software. This is a recent, very exciting discovery, with substantial scientific a...
Software systems are becoming popular. They are used with different platforms for different applicat...
The n-gram language model, which has its roots in statistical natural language processing, has been ...
This thesis addresses the technical and linguistic aspects of discourse-level processing in phrase-b...
The purpose of this study is to use the concepts learned in NLP (Natural Language Processing), combi...
ASE 2015 : 2015 30th IEEE/ACM International Conference on Automated Software Engineering, 9-13 Nov. ...
Statistical language models have successfully been used to describe and analyze natural language doc...
peer reviewedNatural language processing techniques, in particular n-gram models, have been applied ...
Recent research shows that language models, such as n-gram models, are useful at a wide variety of s...
Source code is a form of human communication, albeit one where the information shared between the pr...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
Large repositories of source code create new challenges and opportunities for statistical machine le...
Abstract—Natural languages like English are rich, complex, and powerful. The highly creative and gra...
Statistical Machine Translation (SMT) has gained enormous popularity in recent years as natural lan...
Research at the intersection of machine learning, programming languages, and software engineering ha...
“Naturalness ” of Software. This is a recent, very exciting discovery, with substantial scientific a...
Software systems are becoming popular. They are used with different platforms for different applicat...
The n-gram language model, which has its roots in statistical natural language processing, has been ...
This thesis addresses the technical and linguistic aspects of discourse-level processing in phrase-b...
The purpose of this study is to use the concepts learned in NLP (Natural Language Processing), combi...
ASE 2015 : 2015 30th IEEE/ACM International Conference on Automated Software Engineering, 9-13 Nov. ...
Statistical language models have successfully been used to describe and analyze natural language doc...
peer reviewedNatural language processing techniques, in particular n-gram models, have been applied ...
Recent research shows that language models, such as n-gram models, are useful at a wide variety of s...
Source code is a form of human communication, albeit one where the information shared between the pr...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
Large repositories of source code create new challenges and opportunities for statistical machine le...