HGNNLink: recovering requirements-code traceability links with text and dependency-aware heterogeneous graph neural networks

IF 3.1 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-05-31 DOI:10.1007/s10515-025-00528-2

Bangchao Wang, Zhiyuan Zou, Xuanxuan Liang, Huan Jin, Peng Liang

{"title":"HGNNLink: recovering requirements-code traceability links with text and dependency-aware heterogeneous graph neural networks","authors":"Bangchao Wang, Zhiyuan Zou, Xuanxuan Liang, Huan Jin, Peng Liang","doi":"10.1007/s10515-025-00528-2","DOIUrl":null,"url":null,"abstract":"<div><p>Manually recovering traceability links between requirements and code artifacts often consumes substantial human resources. To address this, researchers have proposed automated methods based on textual similarity between requirements and code artifacts, such as information retrieval (IR) and pre-trained models, to determine whether traceability links exist between requirements and code artifacts. However, in the same system, developers often follow similar naming conventions and repeatedly use the same frameworks and template code, resulting in high textual similarity between code artifacts that are functionally unrelated. This makes it difficult to accurately identify the corresponding code artifacts for requirements artifacts solely based on textual similarity. Therefore, it is necessary to leverage the dependency relationships between code artifacts to assist in the requirements-code traceability link recovery process. Existing methods often treat dependency relationships as a post-processing step to refine textual similarity, overlooking the importance of textual similarity and dependency relationships in generating requirements-code traceability links. To address these limitations, we proposed Heterogeneous Graph Neural Network Link (HGNNLink), a requirements traceability approach that uses vectors generated by pre-trained models as node features and considers IR similarity and dependency relationships as edge features. By employing a heterogeneous graph neural network, HGNNLink aggregates and dynamically evaluates the impact of textual similarity and code dependencies on link generation. The experimental results show that HGNNLink improves the average F1 score by 13.36% compared to the current state-of-the-art (SOTA) method GA-XWCoDe in a dataset collected from ten open source software (OSS) projects. HGNNLink can extend IR methods by using high similarity candidate links as edges, and the extended HGNNLink achieves a 2.48% improvement in F1 compared to the original IR method after threshold parameter configuration using a genetic algorithm.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00528-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Manually recovering traceability links between requirements and code artifacts often consumes substantial human resources. To address this, researchers have proposed automated methods based on textual similarity between requirements and code artifacts, such as information retrieval (IR) and pre-trained models, to determine whether traceability links exist between requirements and code artifacts. However, in the same system, developers often follow similar naming conventions and repeatedly use the same frameworks and template code, resulting in high textual similarity between code artifacts that are functionally unrelated. This makes it difficult to accurately identify the corresponding code artifacts for requirements artifacts solely based on textual similarity. Therefore, it is necessary to leverage the dependency relationships between code artifacts to assist in the requirements-code traceability link recovery process. Existing methods often treat dependency relationships as a post-processing step to refine textual similarity, overlooking the importance of textual similarity and dependency relationships in generating requirements-code traceability links. To address these limitations, we proposed Heterogeneous Graph Neural Network Link (HGNNLink), a requirements traceability approach that uses vectors generated by pre-trained models as node features and considers IR similarity and dependency relationships as edge features. By employing a heterogeneous graph neural network, HGNNLink aggregates and dynamically evaluates the impact of textual similarity and code dependencies on link generation. The experimental results show that HGNNLink improves the average F1 score by 13.36% compared to the current state-of-the-art (SOTA) method GA-XWCoDe in a dataset collected from ten open source software (OSS) projects. HGNNLink can extend IR methods by using high similarity candidate links as edges, and the extended HGNNLink achieves a 2.48% improvement in F1 compared to the original IR method after threshold parameter configuration using a genetic algorithm.

查看原文本刊更多论文

HGNNLink：使用文本和依赖关系感知的异构图神经网络恢复需求代码可追溯性链接

手动恢复需求和代码工件之间的可追溯性链接通常会消耗大量的人力资源。为了解决这个问题，研究人员提出了基于需求和代码工件之间的文本相似性的自动化方法，例如信息检索（IR）和预训练模型，以确定需求和代码工件之间是否存在可追溯性链接。然而，在相同的系统中，开发人员经常遵循相似的命名约定，并重复使用相同的框架和模板代码，从而导致功能不相关的代码工件之间的文本高度相似。这使得仅仅基于文本相似性的需求工件难以准确地识别相应的代码工件。因此，有必要利用代码工件之间的依赖关系来协助需求-代码可跟踪性链接恢复过程。现有的方法通常将依赖关系视为细化文本相似性的后处理步骤，忽略了文本相似性和依赖关系在生成需求-代码可追溯性链接中的重要性。为了解决这些限制，我们提出了异构图神经网络链接（HGNNLink），这是一种需求可追溯性方法，使用预训练模型生成的向量作为节点特征，并将IR相似性和依赖关系作为边缘特征。通过采用异构图神经网络，HGNNLink聚合并动态评估文本相似性和代码依赖性对链接生成的影响。实验结果表明，与目前最先进的GA-XWCoDe方法相比，HGNNLink在10个开源软件（OSS）项目数据集中的F1平均得分提高了13.36%。HGNNLink利用高相似度候选链路作为边缘对红外方法进行扩展，采用遗传算法配置阈值参数后，扩展后的HGNNLink在F1上比原红外方法提高了2.48%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.