{"title":"Naturalness and Artifice of Code: Exploiting the Bi-Modality","authors":"Prem Devanbu","doi":"10.1145/3511430.3511915","DOIUrl":null,"url":null,"abstract":"While natural languages are rich in vocabulary and grammatical flexibility, most human are mundane and repetitive. This repetitiveness in natural language has led to great advances in statistical NLP methods. In our lab, we discovered (almost a decade ago) that, despite the considerable power and flexibility of programming languages, large software corpora are actually even more repetitive than NL Corpora. We also showed that this “naturalness” of code could be captured in language models, and exploited within software tools. This line of work has prospered, and been turbo-charged by the tremendous capacity and design flexibility of deep learning models. Numerous other creative and interesting applications of naturalness have ensued, from colleagues around the world, and several industrial applications have emerged. Recently, we have been studying the consequences and opportunities arising from the observation that Software is bimodal: it’s written not only to be run on machines, but also read by humans; this makes software amenable to both algorithmic analysis, and statistical prediction. Bimodality allows new ways of training machine learning models, new ways of designing analysis algorithms, and new ways to understand the practice of programming. In this talk, I will begin with a backgrounder on ”Naturalness” studies, and the promise of bimodality.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th Innovations in Software Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511430.3511915","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
While natural languages are rich in vocabulary and grammatical flexibility, most human are mundane and repetitive. This repetitiveness in natural language has led to great advances in statistical NLP methods. In our lab, we discovered (almost a decade ago) that, despite the considerable power and flexibility of programming languages, large software corpora are actually even more repetitive than NL Corpora. We also showed that this “naturalness” of code could be captured in language models, and exploited within software tools. This line of work has prospered, and been turbo-charged by the tremendous capacity and design flexibility of deep learning models. Numerous other creative and interesting applications of naturalness have ensued, from colleagues around the world, and several industrial applications have emerged. Recently, we have been studying the consequences and opportunities arising from the observation that Software is bimodal: it’s written not only to be run on machines, but also read by humans; this makes software amenable to both algorithmic analysis, and statistical prediction. Bimodality allows new ways of training machine learning models, new ways of designing analysis algorithms, and new ways to understand the practice of programming. In this talk, I will begin with a backgrounder on ”Naturalness” studies, and the promise of bimodality.