BERT4Anno：一个Java注释误用检测方法

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-05-01 DOI:10.1016/j.infsof.2025.107763

Jingbo Yang , Xin Ji , Wenjun Wu , Xingchuang Liao , Kui Zhang , Linxiao Dong , Nan Xiang , Ren Jian

{"title":"BERT4Anno：一个Java注释误用检测方法","authors":"Jingbo Yang , Xin Ji , Wenjun Wu , Xingchuang Liao , Kui Zhang , Linxiao Dong , Nan Xiang , Ren Jian","doi":"10.1016/j.infsof.2025.107763","DOIUrl":null,"url":null,"abstract":"<div><h3>Context</h3><div>Developers leverage Java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application to crash. Although state-of-the-art techniques attempt to solve this problem, they do not provide conclusions on Java annotation misuse types, nor do they leverage project-level information, which results in low efficiency in detecting annotation misuses.</div></div><div><h3>Objective</h3><div>To summarize Java annotation misuse types and provide a more efficient method for detecting misused annotations.</div></div><div><h3>Method</h3><div>Firstly, to categorize Java annotation misuses, we conduct an empirical study and curate 321 annotation misuse questions from Stack Overflow. Secondly, to better detect these misuses, we propose a BERT-based method, BERT4Anno, which takes project structure and resource configuration into account—factors often neglected by state-of-the-art methods. In BERT4Anno, a novel Annotation Usage Project Representation (AUPR) technique is designed to leverage the information of the interconnections among source code, configuration and project structure. Moreover, an AUPR-based Named Entity Recognition (ANER) task by fine-tuning BERT is devised to learn annotation usage knowledge. With the knowledge, the fine-tuned model can detect misused annotations. Finally, to evaluate our proposed method, two datasets, mainly curated from GitHub and comprising 404 Java projects/files with annotation misuse instances, are used for the experiments.</div></div><div><h3>Results</h3><div>The Java annotation misuses are categorized into 14 types based on how the curated questions violate the correct annotation usage knowledge. The comparison experiment demonstrates the superior performance of our method over state-of-the-art baselines in terms of precision, recall, and F1 score, while our visualization technique provides insightful interpretations of the mechanism underlying the model’s outperformance.</div></div><div><h3>Conclusion</h3><div>By leveraging the project-level information, our proposed method can predict the appropriate types and positions of annotations and subsequently identify the misused annotations, making the detection more efficient.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"184 ","pages":"Article 107763"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BERT4Anno: An annotation misuse detection method for Java\",\"authors\":\"Jingbo Yang , Xin Ji , Wenjun Wu , Xingchuang Liao , Kui Zhang , Linxiao Dong , Nan Xiang , Ren Jian\",\"doi\":\"10.1016/j.infsof.2025.107763\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context</h3><div>Developers leverage Java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application to crash. Although state-of-the-art techniques attempt to solve this problem, they do not provide conclusions on Java annotation misuse types, nor do they leverage project-level information, which results in low efficiency in detecting annotation misuses.</div></div><div><h3>Objective</h3><div>To summarize Java annotation misuse types and provide a more efficient method for detecting misused annotations.</div></div><div><h3>Method</h3><div>Firstly, to categorize Java annotation misuses, we conduct an empirical study and curate 321 annotation misuse questions from Stack Overflow. Secondly, to better detect these misuses, we propose a BERT-based method, BERT4Anno, which takes project structure and resource configuration into account—factors often neglected by state-of-the-art methods. In BERT4Anno, a novel Annotation Usage Project Representation (AUPR) technique is designed to leverage the information of the interconnections among source code, configuration and project structure. Moreover, an AUPR-based Named Entity Recognition (ANER) task by fine-tuning BERT is devised to learn annotation usage knowledge. With the knowledge, the fine-tuned model can detect misused annotations. Finally, to evaluate our proposed method, two datasets, mainly curated from GitHub and comprising 404 Java projects/files with annotation misuse instances, are used for the experiments.</div></div><div><h3>Results</h3><div>The Java annotation misuses are categorized into 14 types based on how the curated questions violate the correct annotation usage knowledge. The comparison experiment demonstrates the superior performance of our method over state-of-the-art baselines in terms of precision, recall, and F1 score, while our visualization technique provides insightful interpretations of the mechanism underlying the model’s outperformance.</div></div><div><h3>Conclusion</h3><div>By leveraging the project-level information, our proposed method can predict the appropriate types and positions of annotations and subsequently identify the misused annotations, making the detection more efficient.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"184 \",\"pages\":\"Article 107763\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925001028\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925001028","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

contextdeveloper利用Java注释来实现创建对象和操作数据库等功能。然而，掌握注释是具有挑战性的，错误使用注释可能会导致应用程序崩溃。尽管最先进的技术试图解决这个问题，但它们没有提供关于Java注释误用类型的结论，也没有利用项目级信息，这导致检测注释误用的效率很低。目的总结Java注释误用的类型，为Java注释误用的检测提供更有效的方法。方法首先，对Java注释误用进行分类，并对来自Stack Overflow的321个注释误用问题进行实证研究。其次，为了更好地检测这些滥用，我们提出了一种基于bert的方法，bert4ano，它考虑了项目结构和资源配置，这些因素通常被最先进的方法所忽略。在BERT4Anno中，设计了一种新的注释使用项目表示（AUPR）技术来利用源代码、配置和项目结构之间的互连信息。此外，通过对BERT进行微调，设计了一个基于aupr的命名实体识别（ANER）任务来学习标注使用知识。有了这些知识，经过微调的模型就可以检测到误用的注释。最后，为了评估我们提出的方法，实验使用了两个数据集，主要来自GitHub，包含404个带有注释误用实例的Java项目/文件。结果根据整理问题对正确标注使用知识的违背程度，将Java标注误用分为14种类型。对比实验表明，我们的方法在精度、召回率和F1分数方面优于最先进的基线，而我们的可视化技术提供了对模型卓越性能背后机制的深刻解释。结论利用项目级信息，我们提出的方法可以预测注释的适当类型和位置，从而识别出误用的注释，提高了检测效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BERT4Anno: An annotation misuse detection method for Java

Context

Developers leverage Java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application to crash. Although state-of-the-art techniques attempt to solve this problem, they do not provide conclusions on Java annotation misuse types, nor do they leverage project-level information, which results in low efficiency in detecting annotation misuses.

Objective

To summarize Java annotation misuse types and provide a more efficient method for detecting misused annotations.

Method

Firstly, to categorize Java annotation misuses, we conduct an empirical study and curate 321 annotation misuse questions from Stack Overflow. Secondly, to better detect these misuses, we propose a BERT-based method, BERT4Anno, which takes project structure and resource configuration into account—factors often neglected by state-of-the-art methods. In BERT4Anno, a novel Annotation Usage Project Representation (AUPR) technique is designed to leverage the information of the interconnections among source code, configuration and project structure. Moreover, an AUPR-based Named Entity Recognition (ANER) task by fine-tuning BERT is devised to learn annotation usage knowledge. With the knowledge, the fine-tuned model can detect misused annotations. Finally, to evaluate our proposed method, two datasets, mainly curated from GitHub and comprising 404 Java projects/files with annotation misuse instances, are used for the experiments.

Results

The Java annotation misuses are categorized into 14 types based on how the curated questions violate the correct annotation usage knowledge. The comparison experiment demonstrates the superior performance of our method over state-of-the-art baselines in terms of precision, recall, and F1 score, while our visualization technique provides insightful interpretations of the mechanism underlying the model’s outperformance.

Conclusion

By leveraging the project-level information, our proposed method can predict the appropriate types and positions of annotations and subsequently identify the misused annotations, making the detection more efficient.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.