基于大型语言模型的跨层需求跟踪

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2025-03-21 DOI:10.1109/TSE.2025.3572094

Chuyan Ge;Tiantian Wang;Xiaotian Yang;Christoph Treude

{"title":"基于大型语言模型的跨层需求跟踪","authors":"Chuyan Ge;Tiantian Wang;Xiaotian Yang;Christoph Treude","doi":"10.1109/TSE.2025.3572094","DOIUrl":null,"url":null,"abstract":"Cross-level requirements traceability, linking <bold>high-level requirements (HLRs)</b> and <bold>low-level requirements (LLRs)</b>, is essential for maintaining relationships and consistency in software development. However, the manual creation of requirements links necessitates a profound understanding of the project and entails a complex and laborious process. Existing machine learning and deep learning methods often fail to fully understand semantic information, leading to low accuracy and unstable performance. This paper presents the first approach for cross-level requirements tracing based on large language models (LLMs) and introduces a data augmentation strategy (such as synonym replacement, machine translation, and noise introduction) to enhance model robustness. We compare three fine-tuning strategies—LoRA, P-Tuning, and Prompt-Tuning—on different scales of LLaMA models (1.1B, 7B, and 13B). The fine-tuned LLMs exhibit superior performance across various datasets, including six single-project datasets, three cross-project datasets within the same domain, and one cross-domain dataset. Experimental results show that fine-tuned LLMs outperform traditional information retrieval, machine learning, and deep learning methods on various datasets. Furthermore, we compare the performance of GPT and DeepSeek LLMs under different prompt templates, revealing their high sensitivity to prompt design and relatively poor result stability. Our approach achieves superior performance, outperforming GPT-4o and DeepSeek-r1 by 16.27% and 16.8% in F-measure on cross-domain datasets. Compared to the baseline method that relies on prompt engineering, it achieves a maximum improvement of 13.8%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 7","pages":"2044-2066"},"PeriodicalIF":5.6000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-Level Requirements Tracing Based on Large Language Models\",\"authors\":\"Chuyan Ge;Tiantian Wang;Xiaotian Yang;Christoph Treude\",\"doi\":\"10.1109/TSE.2025.3572094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-level requirements traceability, linking <bold>high-level requirements (HLRs)</b> and <bold>low-level requirements (LLRs)</b>, is essential for maintaining relationships and consistency in software development. However, the manual creation of requirements links necessitates a profound understanding of the project and entails a complex and laborious process. Existing machine learning and deep learning methods often fail to fully understand semantic information, leading to low accuracy and unstable performance. This paper presents the first approach for cross-level requirements tracing based on large language models (LLMs) and introduces a data augmentation strategy (such as synonym replacement, machine translation, and noise introduction) to enhance model robustness. We compare three fine-tuning strategies—LoRA, P-Tuning, and Prompt-Tuning—on different scales of LLaMA models (1.1B, 7B, and 13B). The fine-tuned LLMs exhibit superior performance across various datasets, including six single-project datasets, three cross-project datasets within the same domain, and one cross-domain dataset. Experimental results show that fine-tuned LLMs outperform traditional information retrieval, machine learning, and deep learning methods on various datasets. Furthermore, we compare the performance of GPT and DeepSeek LLMs under different prompt templates, revealing their high sensitivity to prompt design and relatively poor result stability. Our approach achieves superior performance, outperforming GPT-4o and DeepSeek-r1 by 16.27% and 16.8% in F-measure on cross-domain datasets. Compared to the baseline method that relies on prompt engineering, it achieves a maximum improvement of 13.8%.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"51 7\",\"pages\":\"2044-2066\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11008781/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11008781/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

跨层需求可追溯性，将高级需求（hlr）和低级需求（llr）联系起来，对于维护软件开发中的关系和一致性是必不可少的。然而，手工创建需求链接需要对项目有深刻的理解，并且需要一个复杂而费力的过程。现有的机器学习和深度学习方法往往不能完全理解语义信息，导致准确性低，性能不稳定。本文提出了基于大型语言模型（llm）的跨层需求跟踪的第一种方法，并引入了一种数据增强策略（如同义词替换、机器翻译和噪声引入）来增强模型的鲁棒性。我们比较了三种微调策略- lora， P-Tuning和prompt -tuning -在不同尺度的LLaMA模型（1.1B， 7B和13B）。经过微调的llm在各种数据集上表现出卓越的性能，包括六个单项目数据集，同一领域内的三个跨项目数据集和一个跨领域数据集。实验结果表明，在各种数据集上，微调llm优于传统的信息检索、机器学习和深度学习方法。此外，我们比较了GPT和DeepSeek llm在不同提示模板下的性能，发现它们对提示设计的敏感性较高，结果稳定性相对较差。我们的方法取得了优异的性能，在跨域数据集上的F-measure比gpt - 40和DeepSeek-r1分别高出16.27%和16.8%。与依赖即时工程的基线方法相比，该方法的最大改进率为13.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cross-Level Requirements Tracing Based on Large Language Models

Cross-level requirements traceability, linking high-level requirements (HLRs) and low-level requirements (LLRs), is essential for maintaining relationships and consistency in software development. However, the manual creation of requirements links necessitates a profound understanding of the project and entails a complex and laborious process. Existing machine learning and deep learning methods often fail to fully understand semantic information, leading to low accuracy and unstable performance. This paper presents the first approach for cross-level requirements tracing based on large language models (LLMs) and introduces a data augmentation strategy (such as synonym replacement, machine translation, and noise introduction) to enhance model robustness. We compare three fine-tuning strategies—LoRA, P-Tuning, and Prompt-Tuning—on different scales of LLaMA models (1.1B, 7B, and 13B). The fine-tuned LLMs exhibit superior performance across various datasets, including six single-project datasets, three cross-project datasets within the same domain, and one cross-domain dataset. Experimental results show that fine-tuned LLMs outperform traditional information retrieval, machine learning, and deep learning methods on various datasets. Furthermore, we compare the performance of GPT and DeepSeek LLMs under different prompt templates, revealing their high sensitivity to prompt design and relatively poor result stability. Our approach achieves superior performance, outperforming GPT-4o and DeepSeek-r1 by 16.27% and 16.8% in F-measure on cross-domain datasets. Compared to the baseline method that relies on prompt engineering, it achieves a maximum improvement of 13.8%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.