A Cross-Language Name Binding Recognition and Discrimination Approach for Identifiers

2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2023-03-01 DOI:10.1109/SANER56733.2023.00115

Yue Ju, Yixuan Tang, Jinpeng Lan, Xiangbo Mi, Jingxuan Zhang

{"title":"A Cross-Language Name Binding Recognition and Discrimination Approach for Identifiers","authors":"Yue Ju, Yixuan Tang, Jinpeng Lan, Xiangbo Mi, Jingxuan Zhang","doi":"10.1109/SANER56733.2023.00115","DOIUrl":null,"url":null,"abstract":"Software developers usually rename identifiers and propagate the renaming based on the name binding of identifiers. Currently, software applications are usually developed using more than one language to enhance their functions and behaviors. Hence, when an identifier renaming is performed, it frequently affects more than one language in the multiple-language software applications. However, existing name binding approaches for identifiers only focus on a specific single language without considering the cross-language scenario. In this paper, we propose a cross-language name binding approach for the Java framework based on the deep learning model. Specifically, we first detect the potential name binding pairs via string matching. By analyzing the name binding pairs, the context, and the framework rules of identifiers, we extract several deep semantic features of identifiers and employ the BERT pre-trained model to recognize the name binding for unique identifiers, and further combine several classifiers to discriminate the name binding for duplicate identifiers. Our approach is evaluated on a manually constructed experimental dataset from 10 multiple-language projects. Experimental results demonstrate that our approach can achieve the average F-Measure of 85.14% in unique identifiers and 86.57% in duplicate identifiers, which significantly outperforms the baseline approaches. We also compare the performance of our approach against IntelliJ IDEA to further show its usefulness for developers in the real scenario.","PeriodicalId":281850,"journal":{"name":"2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER56733.2023.00115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Software developers usually rename identifiers and propagate the renaming based on the name binding of identifiers. Currently, software applications are usually developed using more than one language to enhance their functions and behaviors. Hence, when an identifier renaming is performed, it frequently affects more than one language in the multiple-language software applications. However, existing name binding approaches for identifiers only focus on a specific single language without considering the cross-language scenario. In this paper, we propose a cross-language name binding approach for the Java framework based on the deep learning model. Specifically, we first detect the potential name binding pairs via string matching. By analyzing the name binding pairs, the context, and the framework rules of identifiers, we extract several deep semantic features of identifiers and employ the BERT pre-trained model to recognize the name binding for unique identifiers, and further combine several classifiers to discriminate the name binding for duplicate identifiers. Our approach is evaluated on a manually constructed experimental dataset from 10 multiple-language projects. Experimental results demonstrate that our approach can achieve the average F-Measure of 85.14% in unique identifiers and 86.57% in duplicate identifiers, which significantly outperforms the baseline approaches. We also compare the performance of our approach against IntelliJ IDEA to further show its usefulness for developers in the real scenario.

查看原文本刊更多论文

标识符的跨语言名称绑定识别与判别方法

软件开发人员通常重命名标识符，并根据标识符的名称绑定传播重命名。目前，软件应用程序通常使用一种以上的语言来开发，以增强其功能和行为。因此，当执行标识符重命名时，它经常影响多语言软件应用程序中的一种以上语言。但是，标识符的现有名称绑定方法只关注特定的单一语言，而没有考虑跨语言场景。在本文中，我们提出了一种基于深度学习模型的Java框架跨语言名称绑定方法。具体来说，我们首先通过字符串匹配检测潜在的名称绑定对。通过分析标识符的名称绑定对、上下文和框架规则，提取标识符的深层语义特征，利用BERT预训练模型识别唯一标识符的名称绑定，并结合多个分类器识别重复标识符的名称绑定。我们的方法在一个来自10个多语言项目的人工构建的实验数据集上进行了评估。实验结果表明，该方法在唯一标识符和重复标识符上的平均F-Measure分别达到85.14%和86.57%，显著优于基线方法。我们还比较了我们的方法与IntelliJ IDEA的性能，以进一步展示其在实际场景中的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量