{"title":"基于大型语言模型的跨层需求跟踪","authors":"Chuyan Ge;Tiantian Wang;Xiaotian Yang;Christoph Treude","doi":"10.1109/TSE.2025.3572094","DOIUrl":null,"url":null,"abstract":"Cross-level requirements traceability, linking <bold>high-level requirements (HLRs)</b> and <bold>low-level requirements (LLRs)</b>, is essential for maintaining relationships and consistency in software development. However, the manual creation of requirements links necessitates a profound understanding of the project and entails a complex and laborious process. Existing machine learning and deep learning methods often fail to fully understand semantic information, leading to low accuracy and unstable performance. This paper presents the first approach for cross-level requirements tracing based on large language models (LLMs) and introduces a data augmentation strategy (such as synonym replacement, machine translation, and noise introduction) to enhance model robustness. We compare three fine-tuning strategies—LoRA, P-Tuning, and Prompt-Tuning—on different scales of LLaMA models (1.1B, 7B, and 13B). The fine-tuned LLMs exhibit superior performance across various datasets, including six single-project datasets, three cross-project datasets within the same domain, and one cross-domain dataset. Experimental results show that fine-tuned LLMs outperform traditional information retrieval, machine learning, and deep learning methods on various datasets. Furthermore, we compare the performance of GPT and DeepSeek LLMs under different prompt templates, revealing their high sensitivity to prompt design and relatively poor result stability. Our approach achieves superior performance, outperforming GPT-4o and DeepSeek-r1 by 16.27% and 16.8% in F-measure on cross-domain datasets. Compared to the baseline method that relies on prompt engineering, it achieves a maximum improvement of 13.8%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 7","pages":"2044-2066"},"PeriodicalIF":5.6000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-Level Requirements Tracing Based on Large Language Models\",\"authors\":\"Chuyan Ge;Tiantian Wang;Xiaotian Yang;Christoph Treude\",\"doi\":\"10.1109/TSE.2025.3572094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-level requirements traceability, linking <bold>high-level requirements (HLRs)</b> and <bold>low-level requirements (LLRs)</b>, is essential for maintaining relationships and consistency in software development. However, the manual creation of requirements links necessitates a profound understanding of the project and entails a complex and laborious process. Existing machine learning and deep learning methods often fail to fully understand semantic information, leading to low accuracy and unstable performance. This paper presents the first approach for cross-level requirements tracing based on large language models (LLMs) and introduces a data augmentation strategy (such as synonym replacement, machine translation, and noise introduction) to enhance model robustness. We compare three fine-tuning strategies—LoRA, P-Tuning, and Prompt-Tuning—on different scales of LLaMA models (1.1B, 7B, and 13B). The fine-tuned LLMs exhibit superior performance across various datasets, including six single-project datasets, three cross-project datasets within the same domain, and one cross-domain dataset. Experimental results show that fine-tuned LLMs outperform traditional information retrieval, machine learning, and deep learning methods on various datasets. Furthermore, we compare the performance of GPT and DeepSeek LLMs under different prompt templates, revealing their high sensitivity to prompt design and relatively poor result stability. Our approach achieves superior performance, outperforming GPT-4o and DeepSeek-r1 by 16.27% and 16.8% in F-measure on cross-domain datasets. Compared to the baseline method that relies on prompt engineering, it achieves a maximum improvement of 13.8%.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"51 7\",\"pages\":\"2044-2066\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11008781/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11008781/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Cross-Level Requirements Tracing Based on Large Language Models
Cross-level requirements traceability, linking high-level requirements (HLRs) and low-level requirements (LLRs), is essential for maintaining relationships and consistency in software development. However, the manual creation of requirements links necessitates a profound understanding of the project and entails a complex and laborious process. Existing machine learning and deep learning methods often fail to fully understand semantic information, leading to low accuracy and unstable performance. This paper presents the first approach for cross-level requirements tracing based on large language models (LLMs) and introduces a data augmentation strategy (such as synonym replacement, machine translation, and noise introduction) to enhance model robustness. We compare three fine-tuning strategies—LoRA, P-Tuning, and Prompt-Tuning—on different scales of LLaMA models (1.1B, 7B, and 13B). The fine-tuned LLMs exhibit superior performance across various datasets, including six single-project datasets, three cross-project datasets within the same domain, and one cross-domain dataset. Experimental results show that fine-tuned LLMs outperform traditional information retrieval, machine learning, and deep learning methods on various datasets. Furthermore, we compare the performance of GPT and DeepSeek LLMs under different prompt templates, revealing their high sensitivity to prompt design and relatively poor result stability. Our approach achieves superior performance, outperforming GPT-4o and DeepSeek-r1 by 16.27% and 16.8% in F-measure on cross-domain datasets. Compared to the baseline method that relies on prompt engineering, it achieves a maximum improvement of 13.8%.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.