识别共现类代码气味的多标签学习

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Computing Pub Date : 2024-05-27 DOI:10.1007/s00607-024-01294-x

Mouna Hadj-Kacem, Nadia Bouassida

{"title":"识别共现类代码气味的多标签学习","authors":"Mouna Hadj-Kacem, Nadia Bouassida","doi":"10.1007/s00607-024-01294-x","DOIUrl":null,"url":null,"abstract":"<p>Code smell identification is crucial in software maintenance. The existing literature mostly focuses on single code smell identification. However, in practice, a software artefact typically exhibits multiple code smells simultaneously where their diffuseness has been assessed, suggesting that 59% of smelly classes are affected by more than one smell. So to meet this complexity found in real-world projects, we propose a multi-label learning-based approach to identify eight code smells at the class-level, i.e. the most sever software artefacts that need to be prioritized in the refactoring process. In our experiments, we have used 12 algorithms from different multi-label learning methods across 30 open-source Java projects, where significant findings have been presented. We have explored co-occurrences between class code smells and examined the impact of correlations on prediction results. Additionally, we assess multi-label learning methods to compare data adaptation versus algorithm adaptation. Our findings highlight the effectiveness of the Ensemble of Classifier Chains and Binary Relevance in achieving high-performance results.</p>","PeriodicalId":10718,"journal":{"name":"Computing","volume":"23 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-label learning for identifying co-occurring class code smells\",\"authors\":\"Mouna Hadj-Kacem, Nadia Bouassida\",\"doi\":\"10.1007/s00607-024-01294-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Code smell identification is crucial in software maintenance. The existing literature mostly focuses on single code smell identification. However, in practice, a software artefact typically exhibits multiple code smells simultaneously where their diffuseness has been assessed, suggesting that 59% of smelly classes are affected by more than one smell. So to meet this complexity found in real-world projects, we propose a multi-label learning-based approach to identify eight code smells at the class-level, i.e. the most sever software artefacts that need to be prioritized in the refactoring process. In our experiments, we have used 12 algorithms from different multi-label learning methods across 30 open-source Java projects, where significant findings have been presented. We have explored co-occurrences between class code smells and examined the impact of correlations on prediction results. Additionally, we assess multi-label learning methods to compare data adaptation versus algorithm adaptation. Our findings highlight the effectiveness of the Ensemble of Classifier Chains and Binary Relevance in achieving high-performance results.</p>\",\"PeriodicalId\":10718,\"journal\":{\"name\":\"Computing\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00607-024-01294-x\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00607-024-01294-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

代码气味识别对软件维护至关重要。现有文献大多侧重于单一代码气味的识别。然而，在实践中，软件工件通常会同时表现出多种代码气味，其扩散性已得到评估，表明 59% 的气味类受到不止一种气味的影响。因此，为了应对现实世界项目中的这种复杂性，我们提出了一种基于多标签学习的方法，用于识别类级的八种代码气味，即在重构过程中需要优先处理的最严重的软件构件。在实验中，我们在 30 个开源 Java 项目中使用了来自不同多标签学习方法的 12 种算法，并取得了重大发现。我们探索了类代码气味之间的共现关系，并研究了相关性对预测结果的影响。此外，我们还评估了多标签学习方法，以比较数据适应性与算法适应性。我们的研究结果凸显了分类器链组合和二元相关性在实现高性能结果方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Multi-label learning for identifying co-occurring class code smells

查看原文本刊更多论文

Multi-label learning for identifying co-occurring class code smells

Code smell identification is crucial in software maintenance. The existing literature mostly focuses on single code smell identification. However, in practice, a software artefact typically exhibits multiple code smells simultaneously where their diffuseness has been assessed, suggesting that 59% of smelly classes are affected by more than one smell. So to meet this complexity found in real-world projects, we propose a multi-label learning-based approach to identify eight code smells at the class-level, i.e. the most sever software artefacts that need to be prioritized in the refactoring process. In our experiments, we have used 12 algorithms from different multi-label learning methods across 30 open-source Java projects, where significant findings have been presented. We have explored co-occurrences between class code smells and examined the impact of correlations on prediction results. Additionally, we assess multi-label learning methods to compare data adaptation versus algorithm adaptation. Our findings highlight the effectiveness of the Ensemble of Classifier Chains and Binary Relevance in achieving high-performance results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computing 工程技术-计算机：理论方法

CiteScore

8.20

自引率

2.70%

发文量

107

审稿时长

3 months

期刊介绍： Computing publishes original papers, short communications and surveys on all fields of computing. The contributions should be written in English and may be of theoretical or applied nature, the essential criteria are computational relevance and systematic foundation of results.