Zhiyuan Zou , Bangchao Wang , Xinrong Hu , Yang Deng , Hongyan Wan , Huan Jin
{"title":"利用 GA-XWCoDe 增强从需求到代码的可追溯性:集成 XGBoost、Node2Vec 和遗传算法,提高模型性能和稳定性","authors":"Zhiyuan Zou , Bangchao Wang , Xinrong Hu , Yang Deng , Hongyan Wan , Huan Jin","doi":"10.1016/j.jksuci.2024.102197","DOIUrl":null,"url":null,"abstract":"<div><div>This study addresses the challenge of requirements-to-code traceability by proposing a novel model, Genetic Algorithm-XGBoost With Code Dependency (GA-XWCoDe), which integrates eXtreme Gradient Boosting (XGBoost) with a Node2Vec model-weighted code dependency strategy and genetic algorithms for parameter optimisation. XGBoost mitigates overfitting and enhances model stability, while Node2Vec improves prediction accuracy for low-confidence links. Genetic algorithms are employed to optimise model parameters efficiently, reducing the resource intensity of traditional methods. Experimental results show that GA-XWCoDe outperforms the state-of-the-art method TRAceability lInk cLassifier (TRAIL) by 17.44% and Deep Forest for Requirement traceability (DF4RT) by 33.36% in terms of average F1 performance across four datasets. It is significantly superior to all baseline methods at a confidence level of <span><math><mi>α</mi></math></span>¡0.01 and demonstrates exceptional performance and stability across various training data scales.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 8","pages":"Article 102197"},"PeriodicalIF":5.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing requirements-to-code traceability with GA-XWCoDe: Integrating XGBoost, Node2Vec, and genetic algorithms for improving model performance and stability\",\"authors\":\"Zhiyuan Zou , Bangchao Wang , Xinrong Hu , Yang Deng , Hongyan Wan , Huan Jin\",\"doi\":\"10.1016/j.jksuci.2024.102197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study addresses the challenge of requirements-to-code traceability by proposing a novel model, Genetic Algorithm-XGBoost With Code Dependency (GA-XWCoDe), which integrates eXtreme Gradient Boosting (XGBoost) with a Node2Vec model-weighted code dependency strategy and genetic algorithms for parameter optimisation. XGBoost mitigates overfitting and enhances model stability, while Node2Vec improves prediction accuracy for low-confidence links. Genetic algorithms are employed to optimise model parameters efficiently, reducing the resource intensity of traditional methods. Experimental results show that GA-XWCoDe outperforms the state-of-the-art method TRAceability lInk cLassifier (TRAIL) by 17.44% and Deep Forest for Requirement traceability (DF4RT) by 33.36% in terms of average F1 performance across four datasets. It is significantly superior to all baseline methods at a confidence level of <span><math><mi>α</mi></math></span>¡0.01 and demonstrates exceptional performance and stability across various training data scales.</div></div>\",\"PeriodicalId\":48547,\"journal\":{\"name\":\"Journal of King Saud University-Computer and Information Sciences\",\"volume\":\"36 8\",\"pages\":\"Article 102197\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of King Saud University-Computer and Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1319157824002866\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of King Saud University-Computer and Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1319157824002866","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
本研究针对需求到代码的可追溯性所面临的挑战,提出了一种新的模型--代码依赖性遗传算法-XGBoost(GA-XWCoDe),该模型集成了 eXtreme Gradient Boosting(XGBoost)、Node2Vec 模型加权代码依赖性策略和参数优化遗传算法。XGBoost 可减轻过度拟合并增强模型稳定性,而 Node2Vec 则可提高低置信度链接的预测准确性。遗传算法用于有效优化模型参数,降低了传统方法的资源强度。实验结果表明,就四个数据集的平均 F1 性能而言,GA-XWCoDe 比最先进的 TRAceability lInk cLassifier(TRAIL)方法高出 17.44%,比需求可追溯性深林(DF4RT)方法高出 33.36%。在置信度为 α¡0.01 时,它明显优于所有基线方法,并在各种训练数据规模下表现出卓越的性能和稳定性。
Enhancing requirements-to-code traceability with GA-XWCoDe: Integrating XGBoost, Node2Vec, and genetic algorithms for improving model performance and stability
This study addresses the challenge of requirements-to-code traceability by proposing a novel model, Genetic Algorithm-XGBoost With Code Dependency (GA-XWCoDe), which integrates eXtreme Gradient Boosting (XGBoost) with a Node2Vec model-weighted code dependency strategy and genetic algorithms for parameter optimisation. XGBoost mitigates overfitting and enhances model stability, while Node2Vec improves prediction accuracy for low-confidence links. Genetic algorithms are employed to optimise model parameters efficiently, reducing the resource intensity of traditional methods. Experimental results show that GA-XWCoDe outperforms the state-of-the-art method TRAceability lInk cLassifier (TRAIL) by 17.44% and Deep Forest for Requirement traceability (DF4RT) by 33.36% in terms of average F1 performance across four datasets. It is significantly superior to all baseline methods at a confidence level of ¡0.01 and demonstrates exceptional performance and stability across various training data scales.
期刊介绍:
In 2022 the Journal of King Saud University - Computer and Information Sciences will become an author paid open access journal. Authors who submit their manuscript after October 31st 2021 will be asked to pay an Article Processing Charge (APC) after acceptance of their paper to make their work immediately, permanently, and freely accessible to all. The Journal of King Saud University Computer and Information Sciences is a refereed, international journal that covers all aspects of both foundations of computer and its practical applications.