{"title":"How Do Programmers Express High-Level Concepts using Primitive Data Types?","authors":"Yusuke Shinyama, Yoshitaka Arahori, K. Gondow","doi":"10.1109/APSEC53868.2021.00043","DOIUrl":null,"url":null,"abstract":"We investigated how programmers express high-level concepts such as path names and coordinates using primitive data types. While relying too much on primitive data types is sometimes criticized as a bad smell, it is still a common practice among programmers. We propose a novel way to accurately identify expressions for certain predefined concepts by examining API calls. We defined twelve conceptual types used in the Java Standard API. We then obtained expressions for each conceptual type from 26 open source projects. Based on the expressions obtained, we trained a decision tree-based classifier. It achieved 83 % F -score for correctly predicting the conceptual type for a given expression. Our result indicates that it is possible to infer a conceptual type from a source code reasonably well once enough examples are given. The obtained classifier can be used for potential bug detection, test case generation and documentation.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We investigated how programmers express high-level concepts such as path names and coordinates using primitive data types. While relying too much on primitive data types is sometimes criticized as a bad smell, it is still a common practice among programmers. We propose a novel way to accurately identify expressions for certain predefined concepts by examining API calls. We defined twelve conceptual types used in the Java Standard API. We then obtained expressions for each conceptual type from 26 open source projects. Based on the expressions obtained, we trained a decision tree-based classifier. It achieved 83 % F -score for correctly predicting the conceptual type for a given expression. Our result indicates that it is possible to infer a conceptual type from a source code reasonably well once enough examples are given. The obtained classifier can be used for potential bug detection, test case generation and documentation.
我们研究了程序员如何使用基本数据类型表达高级概念,如路径名和坐标。虽然过分依赖原始数据类型有时被批评为一种不好的气味,但这仍然是程序员的常见做法。我们提出了一种新的方法,通过检查API调用来准确地识别某些预定义概念的表达式。我们定义了Java Standard API中使用的12种概念类型。然后,我们从26个开源项目中获得了每个概念类型的表达式。基于得到的表达式,我们训练了一个基于决策树的分类器。它在正确预测给定表达式的概念类型方面获得了83%的F分。我们的结果表明,只要给出足够多的例子,就有可能从源代码中很好地推断出概念类型。获得的分类器可用于潜在的错误检测、测试用例生成和文档。