Francesco Casillo, Vincenzo Deufemia, Carmine Gravino
{"title":"Beyond domain dependency in security requirements identification","authors":"Francesco Casillo, Vincenzo Deufemia, Carmine Gravino","doi":"10.1016/j.infsof.2025.107702","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Early security requirements identification is crucial in software development, facilitating the integration of security measures into IT networks and reducing time and costs throughout software life-cycle.</div></div><div><h3>Objectives:</h3><div>This paper addresses the limitations of existing methods that leverage Natural Language Processing (NLP) and machine learning techniques for detecting security requirements. These methods often fall short in capturing syntactic and semantic relationships, face challenges in adapting across domains, and rely heavily on extensive domain-specific data. In this paper we focus on identifying the most effective approaches for this task, highlighting both domain-specific and domain-independent strategies.</div></div><div><h3>Method:</h3><div>Our methodology encompasses two primary streams of investigation. First, we explore shallow machine learning techniques, leveraging word embeddings. We test ensemble methods and grid search within and across domains, evaluating on three industrial datasets. Next, we develop several domain-independent models based on BERT, tailored to better detect security requirements by incorporating data on software weaknesses and vulnerabilities.</div></div><div><h3>Results:</h3><div>Our findings reveal that ensemble and grid search methods prove effective in domain-specific and domain-independent experiments, respectively. However, our custom BERT models showcase domain independence and adaptability. Notably, the CweCveCodeBERT model excels in Precision and F1-score, outperforming existing approaches significantly. It improves F1-score by <span><math><mo>∼</mo></math></span>3% and Precision by <span><math><mo>∼</mo></math></span>14% over the best approach currently in the literature.</div></div><div><h3>Conclusion:</h3><div>BERT-based models, especially with specialized pre-training, show promise for automating security requirement detection. This establishes a foundation for software engineering researchers and practitioners to utilize advanced NLP to improve security in early development phases, fostering the adoption of these state-of-the-art methods in real-world scenarios.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"182 ","pages":"Article 107702"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925000412","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Context:
Early security requirements identification is crucial in software development, facilitating the integration of security measures into IT networks and reducing time and costs throughout software life-cycle.
Objectives:
This paper addresses the limitations of existing methods that leverage Natural Language Processing (NLP) and machine learning techniques for detecting security requirements. These methods often fall short in capturing syntactic and semantic relationships, face challenges in adapting across domains, and rely heavily on extensive domain-specific data. In this paper we focus on identifying the most effective approaches for this task, highlighting both domain-specific and domain-independent strategies.
Method:
Our methodology encompasses two primary streams of investigation. First, we explore shallow machine learning techniques, leveraging word embeddings. We test ensemble methods and grid search within and across domains, evaluating on three industrial datasets. Next, we develop several domain-independent models based on BERT, tailored to better detect security requirements by incorporating data on software weaknesses and vulnerabilities.
Results:
Our findings reveal that ensemble and grid search methods prove effective in domain-specific and domain-independent experiments, respectively. However, our custom BERT models showcase domain independence and adaptability. Notably, the CweCveCodeBERT model excels in Precision and F1-score, outperforming existing approaches significantly. It improves F1-score by 3% and Precision by 14% over the best approach currently in the literature.
Conclusion:
BERT-based models, especially with specialized pre-training, show promise for automating security requirement detection. This establishes a foundation for software engineering researchers and practitioners to utilize advanced NLP to improve security in early development phases, fostering the adoption of these state-of-the-art methods in real-world scenarios.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.