{"title":"HGNNLink: recovering requirements-code traceability links with text and dependency-aware heterogeneous graph neural networks","authors":"Bangchao Wang, Zhiyuan Zou, Xuanxuan Liang, Huan Jin, Peng Liang","doi":"10.1007/s10515-025-00528-2","DOIUrl":null,"url":null,"abstract":"<div><p>Manually recovering traceability links between requirements and code artifacts often consumes substantial human resources. To address this, researchers have proposed automated methods based on textual similarity between requirements and code artifacts, such as information retrieval (IR) and pre-trained models, to determine whether traceability links exist between requirements and code artifacts. However, in the same system, developers often follow similar naming conventions and repeatedly use the same frameworks and template code, resulting in high textual similarity between code artifacts that are functionally unrelated. This makes it difficult to accurately identify the corresponding code artifacts for requirements artifacts solely based on textual similarity. Therefore, it is necessary to leverage the dependency relationships between code artifacts to assist in the requirements-code traceability link recovery process. Existing methods often treat dependency relationships as a post-processing step to refine textual similarity, overlooking the importance of textual similarity and dependency relationships in generating requirements-code traceability links. To address these limitations, we proposed Heterogeneous Graph Neural Network Link (HGNNLink), a requirements traceability approach that uses vectors generated by pre-trained models as node features and considers IR similarity and dependency relationships as edge features. By employing a heterogeneous graph neural network, HGNNLink aggregates and dynamically evaluates the impact of textual similarity and code dependencies on link generation. The experimental results show that HGNNLink improves the average F1 score by 13.36% compared to the current state-of-the-art (SOTA) method GA-XWCoDe in a dataset collected from ten open source software (OSS) projects. HGNNLink can extend IR methods by using high similarity candidate links as edges, and the extended HGNNLink achieves a 2.48% improvement in F1 compared to the original IR method after threshold parameter configuration using a genetic algorithm.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00528-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Manually recovering traceability links between requirements and code artifacts often consumes substantial human resources. To address this, researchers have proposed automated methods based on textual similarity between requirements and code artifacts, such as information retrieval (IR) and pre-trained models, to determine whether traceability links exist between requirements and code artifacts. However, in the same system, developers often follow similar naming conventions and repeatedly use the same frameworks and template code, resulting in high textual similarity between code artifacts that are functionally unrelated. This makes it difficult to accurately identify the corresponding code artifacts for requirements artifacts solely based on textual similarity. Therefore, it is necessary to leverage the dependency relationships between code artifacts to assist in the requirements-code traceability link recovery process. Existing methods often treat dependency relationships as a post-processing step to refine textual similarity, overlooking the importance of textual similarity and dependency relationships in generating requirements-code traceability links. To address these limitations, we proposed Heterogeneous Graph Neural Network Link (HGNNLink), a requirements traceability approach that uses vectors generated by pre-trained models as node features and considers IR similarity and dependency relationships as edge features. By employing a heterogeneous graph neural network, HGNNLink aggregates and dynamically evaluates the impact of textual similarity and code dependencies on link generation. The experimental results show that HGNNLink improves the average F1 score by 13.36% compared to the current state-of-the-art (SOTA) method GA-XWCoDe in a dataset collected from ten open source software (OSS) projects. HGNNLink can extend IR methods by using high similarity candidate links as edges, and the extended HGNNLink achieves a 2.48% improvement in F1 compared to the original IR method after threshold parameter configuration using a genetic algorithm.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.