Do Experts Agree About Smelly Infrastructure?

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2025-03-21 DOI:10.1109/TSE.2025.3553383

Sogol Masoumzadeh;Nuno Saavedra;Rungroj Maipradit;Lili Wei;João F. Ferreira;Dániel Varró;Shane McIntosh

{"title":"Do Experts Agree About Smelly Infrastructure?","authors":"Sogol Masoumzadeh;Nuno Saavedra;Rungroj Maipradit;Lili Wei;João F. Ferreira;Dániel Varró;Shane McIntosh","doi":"10.1109/TSE.2025.3553383","DOIUrl":null,"url":null,"abstract":"Code smells are anti-patterns that violate code understandability, re-usability, changeability, and maintainability. It is important to identify code smells and locate them in the code. For this purpose, automated detection of code smells is a sought-after feature for development tools; however, the design and evaluation of such tools depends on the quality of oracle datasets. The typical approach for creating an oracle dataset involves multiple developers independently inspecting and annotating code examples for their existing code smells. Since multiple inspectors cast votes about each code example, it is possible for the inspectors to disagree about the presence of smells. Such disagreements introduce ambiguity into how smells should be interpreted. Prior work has studied developer perceptions of code smells in traditional source code; however, smells in Infrastructure-as-Code (IaC) have not been investigated. To understand the real-world impact of disagreements among developers and their perceptions of IaC code smells, we conduct an empirical study on the oracle dataset of GLITCH—a state-of-the-art detection tool for security code smells in IaC. We analyze GLITCH's oracle dataset for code smell issues, their types, and individual annotations of the inspectors. Furthermore, we investigate possible confounding factors associated with the incidences of developer misaligned perceptions of IaC code smells. Finally, we triangulate developer perceptions of code smells in traditional source code with our results on IaC. Our study reveals that unlike developer perceptions of smells in traditional source code, their perceptions of smells in IaC are more substantially impacted by subjective interpretation of smell types and their co-occurrence relationships. For instance, the interpretation of admins by default, empty passwords, and hard-coded secrets varies considerably among raters and are more susceptible to misidentification than other IaC code smells. Consequently, the manual identification of IaC code smells involves annotation disagreements among developers—46.3% of studied IaC code smell incidences have at least one dissenting vote among three inspectors. Meanwhile, only 1.6% of code smell incidences in traditional source code are affected by inspector bias stemming from these disagreements. Hence, relying solely on the majority voting, would not fully represent the breadth of interpretation of the IaC under scrutiny.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1472-1486"},"PeriodicalIF":6.5000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10934743/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Code smells are anti-patterns that violate code understandability, re-usability, changeability, and maintainability. It is important to identify code smells and locate them in the code. For this purpose, automated detection of code smells is a sought-after feature for development tools; however, the design and evaluation of such tools depends on the quality of oracle datasets. The typical approach for creating an oracle dataset involves multiple developers independently inspecting and annotating code examples for their existing code smells. Since multiple inspectors cast votes about each code example, it is possible for the inspectors to disagree about the presence of smells. Such disagreements introduce ambiguity into how smells should be interpreted. Prior work has studied developer perceptions of code smells in traditional source code; however, smells in Infrastructure-as-Code (IaC) have not been investigated. To understand the real-world impact of disagreements among developers and their perceptions of IaC code smells, we conduct an empirical study on the oracle dataset of GLITCH—a state-of-the-art detection tool for security code smells in IaC. We analyze GLITCH's oracle dataset for code smell issues, their types, and individual annotations of the inspectors. Furthermore, we investigate possible confounding factors associated with the incidences of developer misaligned perceptions of IaC code smells. Finally, we triangulate developer perceptions of code smells in traditional source code with our results on IaC. Our study reveals that unlike developer perceptions of smells in traditional source code, their perceptions of smells in IaC are more substantially impacted by subjective interpretation of smell types and their co-occurrence relationships. For instance, the interpretation of admins by default, empty passwords, and hard-coded secrets varies considerably among raters and are more susceptible to misidentification than other IaC code smells. Consequently, the manual identification of IaC code smells involves annotation disagreements among developers—46.3% of studied IaC code smell incidences have at least one dissenting vote among three inspectors. Meanwhile, only 1.6% of code smell incidences in traditional source code are affected by inspector bias stemming from these disagreements. Hence, relying solely on the majority voting, would not fully represent the breadth of interpretation of the IaC under scrutiny.

查看原文本刊更多论文

专家们对臭气熏天的基础设施意见一致吗？

代码气味是违反代码可理解性、可重用性、可变性和可维护性的反模式。识别代码气味并在代码中定位它们是很重要的。出于这个目的，自动检测代码气味是开发工具中非常受欢迎的特性；然而，这些工具的设计和评估取决于oracle数据集的质量。创建oracle数据集的典型方法涉及多个开发人员独立地检查和注释代码示例，以确定他们现有的代码气味。由于多个检查人员对每个代码示例进行投票，因此检查人员可能对气味的存在存在分歧。这样的分歧给如何解释气味带来了模糊性。先前的工作研究了开发人员对传统源代码中代码气味的感知；但是，还没有对基础设施即代码（IaC）中的气味进行研究。为了了解开发人员之间的分歧以及他们对IaC代码气味的看法对现实世界的影响，我们对glitch的oracle数据集进行了实证研究- glitch是IaC中最先进的安全代码气味检测工具。我们分析GLITCH的oracle数据集的代码气味问题，他们的类型，和检查员的个人注释。此外，我们调查了与开发人员对IaC代码气味的误解发生率相关的可能的混杂因素。最后，我们将开发人员对传统源代码中代码气味的感知与我们在IaC上的结果进行三角测量。我们的研究表明，与传统源代码中开发人员对气味的感知不同，他们对IaC中气味的感知更多地受到对气味类型及其共现关系的主观解释的影响。例如，默认情况下对admins、空密码和硬编码秘密的解释在评分者之间差异很大，并且比其他IaC代码气味更容易被错误识别。因此，IaC代码气味的手工识别涉及到开发人员之间的注释分歧——在所研究的IaC代码气味事件中，有46.3%在三个检查人员中至少有一个反对票。与此同时，在传统源代码中，只有1.6%的代码气味事件受到由这些分歧引起的检查员偏见的影响。因此，仅仅依靠多数表决，并不能充分代表在审查下对独立审计委员会的解释的广度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.