{"title":"No Strings Attached: An Empirical Study of String-related Software Bugs","authors":"A. Eghbali, Michael Pradel","doi":"10.1145/3324884.3416576","DOIUrl":null,"url":null,"abstract":"Strings play many roles in programming because they often contain complex and semantically rich information. For example, programmers use strings to filter inputs via regular expression matching, to express the names of program elements accessed through some form of reflection, to embed code written in another formal language, and to assemble textual output produced by a program. The omnipresence of strings leads to a wide range of mistakes that developers may make, yet little is currently known about these mistakes. The lack of knowledge about string-related bugs leads to developers repeating the same mistakes again and again, and to poor support for finding and fixing such bugs. This paper presents the first empirical study of the root causes, consequences, and other properties of string-related bugs. We systematically study 204 string-related bugs in a diverse set of projects written in JavaScript, a language where strings play a particularly important role. Our findings include (i) that many string-related mistakes are caused by a recurring set of root cause patterns, such as incorrect string literals and regular expressions, (ii) that string-related bugs have a diverse set of consequences, including incorrect output or silent omission of expected behavior, (iii) that fixing string-related bugs often requires changing just a single line, with many of the required repair ingredients available in the surrounding code, (iv) that string-related bugs occur across all parts of applications, including the core components, and (v) that almost none of these bugs are detected by existing static analyzers. Our findings not only show the importance and prevalence of string-related bugs, but they help developers to avoid common mistakes and tool builders to tackle the challenge of finding and fixing string-related bugs.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3324884.3416576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Strings play many roles in programming because they often contain complex and semantically rich information. For example, programmers use strings to filter inputs via regular expression matching, to express the names of program elements accessed through some form of reflection, to embed code written in another formal language, and to assemble textual output produced by a program. The omnipresence of strings leads to a wide range of mistakes that developers may make, yet little is currently known about these mistakes. The lack of knowledge about string-related bugs leads to developers repeating the same mistakes again and again, and to poor support for finding and fixing such bugs. This paper presents the first empirical study of the root causes, consequences, and other properties of string-related bugs. We systematically study 204 string-related bugs in a diverse set of projects written in JavaScript, a language where strings play a particularly important role. Our findings include (i) that many string-related mistakes are caused by a recurring set of root cause patterns, such as incorrect string literals and regular expressions, (ii) that string-related bugs have a diverse set of consequences, including incorrect output or silent omission of expected behavior, (iii) that fixing string-related bugs often requires changing just a single line, with many of the required repair ingredients available in the surrounding code, (iv) that string-related bugs occur across all parts of applications, including the core components, and (v) that almost none of these bugs are detected by existing static analyzers. Our findings not only show the importance and prevalence of string-related bugs, but they help developers to avoid common mistakes and tool builders to tackle the challenge of finding and fixing string-related bugs.