Feature envy detection based on cross-graph local semantics matching

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2024-06-12 DOI:10.1016/j.infsof.2024.107515

Quanxin Yang, Dongjin Yu, Xin Chen, Yihang Xu, Wangliang Yan, Bin Hu

{"title":"Feature envy detection based on cross-graph local semantics matching","authors":"Quanxin Yang, Dongjin Yu, Xin Chen, Yihang Xu, Wangliang Yan, Bin Hu","doi":"10.1016/j.infsof.2024.107515","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><p>As a typical code smell, feature envy occurs when a method exhibits excessive reliance and usage on specific functionalities of another class, which can lead to issues with the maintainability and extensibility of the code. As such, detecting and avoiding feature envy is critical for software development. Previous research on detecting feature envy has demonstrated significant advantages of deep learning-based approaches over static code analysis tools. However, current deep learning-based approaches still suffer from two limitations: (1) They focus on the functional or overall semantics of the code, which ignores the opportunities for local code semantics matching, making it challenging to identify some more complex cases; (2) Existing feature envy datasets are collected or synthesized using static code analysis tools, which limits feature envy cases to fixed rules and makes it challenging to cover other complex cases in real projects.</p></div><div><h3>Objective:</h3><p>We are motivated to propose a Siamese graph neural network based on code local semantics matching and collect feature envy refactoring cases from real projects for experimental evaluation.</p></div><div><h3>Method:</h3><p>To address the first issue, we propose a cross-graph local semantics matching network, which aims to simulate human intuition or experience to detect feature envy by analyzing the local semantics matching between code graphs. To address the second one, we manually review and collect commits for refactoring feature envy cases on GitHub. Then, we refer to image data augmentation technology to construct two datasets for identifying feature envy and recommending <em>Move Method</em> refactorings, respectively.</p></div><div><h3>Results:</h3><p>Extensive experiments show that our approach outperforms state-of-the-art baselines regarding both tasks’ comprehensive metrics, F1-score and AUC.</p></div><div><h3>Conclusion:</h3><p>The experimental results indicate that the proposed Siamese graph neural network based on code local semantics matching is effective. In addition, the provided data augmentation algorithms can significantly improve model performance.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"174 ","pages":"Article 107515"},"PeriodicalIF":3.8000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924001204","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

As a typical code smell, feature envy occurs when a method exhibits excessive reliance and usage on specific functionalities of another class, which can lead to issues with the maintainability and extensibility of the code. As such, detecting and avoiding feature envy is critical for software development. Previous research on detecting feature envy has demonstrated significant advantages of deep learning-based approaches over static code analysis tools. However, current deep learning-based approaches still suffer from two limitations: (1) They focus on the functional or overall semantics of the code, which ignores the opportunities for local code semantics matching, making it challenging to identify some more complex cases; (2) Existing feature envy datasets are collected or synthesized using static code analysis tools, which limits feature envy cases to fixed rules and makes it challenging to cover other complex cases in real projects.

Objective:

We are motivated to propose a Siamese graph neural network based on code local semantics matching and collect feature envy refactoring cases from real projects for experimental evaluation.

Method:

To address the first issue, we propose a cross-graph local semantics matching network, which aims to simulate human intuition or experience to detect feature envy by analyzing the local semantics matching between code graphs. To address the second one, we manually review and collect commits for refactoring feature envy cases on GitHub. Then, we refer to image data augmentation technology to construct two datasets for identifying feature envy and recommending Move Method refactorings, respectively.

Results:

Extensive experiments show that our approach outperforms state-of-the-art baselines regarding both tasks’ comprehensive metrics, F1-score and AUC.

Conclusion:

The experimental results indicate that the proposed Siamese graph neural network based on code local semantics matching is effective. In addition, the provided data augmentation algorithms can significantly improve model performance.

查看原文本刊更多论文

基于跨图局部语义匹配的特征嫉妒检测

背景：作为一种典型的代码气味，当一个方法过度依赖和使用另一个类的特定功能时，就会出现 "功能嫉妒"（feature envy），这可能会导致代码的可维护性和可扩展性出现问题。因此，检测和避免功能嫉妒对软件开发至关重要。以往关于检测功能嫉妒的研究表明，基于深度学习的方法比静态代码分析工具具有显著优势。然而，目前基于深度学习的方法仍然存在两个局限性：（1）它们只关注代码的功能或整体语义，忽略了局部代码语义匹配的机会，这使得识别一些更复杂的情况具有挑战性；（2）现有的特征嫉妒数据集是使用静态代码分析工具收集或合成的，这将特征嫉妒的情况限制在固定的规则中，使得覆盖实际项目中的其他复杂情况具有挑战性。方法：针对第一个问题，我们提出了跨图局部语义匹配网络，旨在模拟人类的直觉或经验，通过分析代码图之间的局部语义匹配来检测特征嫉妒。为了解决第二个问题，我们在 GitHub 上手动审查并收集了重构功能嫉妒案例的提交。结论：实验结果表明，基于代码局部语义匹配的连体图神经网络是有效的。结论：实验结果表明，基于代码局部语义匹配的连体图神经网络是有效的，此外，所提供的数据增强算法可以显著提高模型性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.