Boosting Identifier Renaming Opportunity Identification via Context-Based Deep Code Representation

IF 5.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Reliability Pub Date : 2025-02-19 DOI:10.1109/TR.2025.3535736

Jingxuan Zhang;Zhuhang Li;Jiahui Liang;Zhiqiu Huang

{"title":"Boosting Identifier Renaming Opportunity Identification via Context-Based Deep Code Representation","authors":"Jingxuan Zhang;Zhuhang Li;Jiahui Liang;Zhiqiu Huang","doi":"10.1109/TR.2025.3535736","DOIUrl":null,"url":null,"abstract":"Source code refactoring brings many benefits to the software being developed, e.g., reduces the likelihood of future development failures and simplifies the implementation of new features. Among the various code refactoring activities, identifier renaming is one of the most frequent software development activities conducted by developers, which plays an important role in program analysis and understanding. However, manually detecting identifier renaming opportunities is time-consuming and labor-intensive. Recently, researchers have proposed several automatic renaming opportunity identification approaches for identifiers. However, existing approaches only focus on one or several specific types of identifiers without generally considering all the types of identifiers. To resolve this problem, we put forward a new approach to detect identifier renaming opportunities by fully exploiting the changes of the programming context and the related code entities. Specifically, we first utilize a siamese network, which employs different attention headers to incorporate the programming context and the related code entities, to derive the semantically meaningful embeddings of identifiers. We then utilize these vectors to train a classifier, which can be used for predicting renaming opportunities for identifiers. Experimental results on 29 255 identifiers from ten Java projects in the Apache community demonstrate that our approach outperforms the state-of-the-art baseline approach by 11.97% as for the average F-Measure in identifying renaming opportunities for all the types of identifiers. In addition, we also verified the effectiveness of some key components of our approach. For instance, utilizing the related code entities into our approach improves the average F-Measure by 6.60%.","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 3","pages":"3296-3310"},"PeriodicalIF":5.7000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10892346/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Source code refactoring brings many benefits to the software being developed, e.g., reduces the likelihood of future development failures and simplifies the implementation of new features. Among the various code refactoring activities, identifier renaming is one of the most frequent software development activities conducted by developers, which plays an important role in program analysis and understanding. However, manually detecting identifier renaming opportunities is time-consuming and labor-intensive. Recently, researchers have proposed several automatic renaming opportunity identification approaches for identifiers. However, existing approaches only focus on one or several specific types of identifiers without generally considering all the types of identifiers. To resolve this problem, we put forward a new approach to detect identifier renaming opportunities by fully exploiting the changes of the programming context and the related code entities. Specifically, we first utilize a siamese network, which employs different attention headers to incorporate the programming context and the related code entities, to derive the semantically meaningful embeddings of identifiers. We then utilize these vectors to train a classifier, which can be used for predicting renaming opportunities for identifiers. Experimental results on 29 255 identifiers from ten Java projects in the Apache community demonstrate that our approach outperforms the state-of-the-art baseline approach by 11.97% as for the average F-Measure in identifying renaming opportunities for all the types of identifiers. In addition, we also verified the effectiveness of some key components of our approach. For instance, utilizing the related code entities into our approach improves the average F-Measure by 6.60%.

查看原文本刊更多论文

基于上下文的深度代码表示增强标识符重命名机会识别

源代码重构给正在开发的软件带来了许多好处，例如，减少了未来开发失败的可能性，简化了新特性的实现。在各种代码重构活动中，标识符重命名是开发人员进行的最频繁的软件开发活动之一，它在程序分析和理解中起着重要的作用。但是，手动检测标识符重命名机会既耗时又费力。近年来，研究人员提出了几种标识符的自动重命名机会识别方法。然而，现有的方法只关注一种或几种特定类型的标识符，而没有全面考虑所有类型的标识符。为了解决这一问题，我们提出了一种通过充分利用编程上下文和相关代码实体的变化来检测标识符重命名机会的新方法。具体来说，我们首先利用暹罗网络，它使用不同的注意头来合并编程上下文和相关的代码实体，以派生标识符的语义有意义的嵌入。然后我们利用这些向量来训练一个分类器，该分类器可用于预测标识符的重命名机会。对Apache社区中10个Java项目的29255个标识符的实验结果表明，我们的方法在识别所有类型标识符的重命名机会方面的平均F-Measure优于最先进的基线方法11.97%。此外，我们还验证了我们方法的一些关键组成部分的有效性。例如，在我们的方法中使用相关的代码实体将平均F-Measure提高了6.60%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Reliability 工程技术-工程：电子与电气

CiteScore

12.20

自引率

8.50%

发文量

153

审稿时长

7.5 months

期刊介绍： IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.