EnseSmells : Deep ensemble and programming language models for automated code smells detection

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2025-02-15 DOI:10.1016/j.jss.2025.112375

Anh Ho , Anh M.T. Bui , Phuong T. Nguyen , Amleto Di Salle , Bach Le

{"title":"EnseSmells : Deep ensemble and programming language models for automated code smells detection","authors":"Anh Ho , Anh M.T. Bui , Phuong T. Nguyen , Amleto Di Salle , Bach Le","doi":"10.1016/j.jss.2025.112375","DOIUrl":null,"url":null,"abstract":"<div><div>A smell in software source code denotes an indication of suboptimal design and implementation decisions, potentially hindering the code understanding and, in turn, raising the likelihood of being prone to changes and faults. Identifying these code issues at an early stage in the software development process can mitigate these problems and enhance the overall quality of the software. Current research primarily focuses on the utilization of deep learning-based models to investigate the contextual information concealed within source code instructions to detect code smells, with limited attention given to the importance of structural and design-related features. This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics derived from pre-trained models for programming languages. We further provide a thorough analysis of how different source code embedding models affect the detection performance with respect to different code smell types. Using four widely-used code smells from well-designed datasets, our empirical study shows that incorporating design-related features significantly improves detection accuracy, outperforming state-of-the-art methods on the MLCQ dataset with improvements ranging from 5.98% to 28.26%, depending on the type of code smell.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"224 ","pages":"Article 112375"},"PeriodicalIF":3.7000,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225000433","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

A smell in software source code denotes an indication of suboptimal design and implementation decisions, potentially hindering the code understanding and, in turn, raising the likelihood of being prone to changes and faults. Identifying these code issues at an early stage in the software development process can mitigate these problems and enhance the overall quality of the software. Current research primarily focuses on the utilization of deep learning-based models to investigate the contextual information concealed within source code instructions to detect code smells, with limited attention given to the importance of structural and design-related features. This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics derived from pre-trained models for programming languages. We further provide a thorough analysis of how different source code embedding models affect the detection performance with respect to different code smell types. Using four widely-used code smells from well-designed datasets, our empirical study shows that incorporating design-related features significantly improves detection accuracy, outperforming state-of-the-art methods on the MLCQ dataset with improvements ranging from 5.98% to 28.26%, depending on the type of code smell.

查看原文本刊更多论文

EnseSmells：用于自动代码气味检测的深度集成和编程语言模型

软件源代码中的气味表示次优设计和实现决策的指示，潜在地阻碍了代码理解，反过来，增加了易于发生更改和错误的可能性。在软件开发过程的早期阶段识别这些代码问题可以减轻这些问题并提高软件的整体质量。目前的研究主要集中在利用基于深度学习的模型来研究隐藏在源代码指令中的上下文信息，以检测代码气味，而对结构和设计相关特征的重要性关注有限。本文提出了一种新的代码气味检测方法，构建了一个深度学习架构，该架构重视从编程语言的预训练模型中提取的结构特征和统计语义的融合。我们进一步深入分析了不同的源代码嵌入模型如何影响不同代码气味类型的检测性能。我们的实证研究使用了来自设计良好的数据集的四种广泛使用的代码气味，结果表明，结合与设计相关的特征显著提高了检测精度，优于MLCQ数据集上最先进的方法，根据代码气味的类型，改进幅度从5.98%到28.26%不等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.