Beyond domain dependency in security requirements identification

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-03-14 DOI:10.1016/j.infsof.2025.107702

Francesco Casillo, Vincenzo Deufemia, Carmine Gravino

{"title":"Beyond domain dependency in security requirements identification","authors":"Francesco Casillo, Vincenzo Deufemia, Carmine Gravino","doi":"10.1016/j.infsof.2025.107702","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Early security requirements identification is crucial in software development, facilitating the integration of security measures into IT networks and reducing time and costs throughout software life-cycle.</div></div><div><h3>Objectives:</h3><div>This paper addresses the limitations of existing methods that leverage Natural Language Processing (NLP) and machine learning techniques for detecting security requirements. These methods often fall short in capturing syntactic and semantic relationships, face challenges in adapting across domains, and rely heavily on extensive domain-specific data. In this paper we focus on identifying the most effective approaches for this task, highlighting both domain-specific and domain-independent strategies.</div></div><div><h3>Method:</h3><div>Our methodology encompasses two primary streams of investigation. First, we explore shallow machine learning techniques, leveraging word embeddings. We test ensemble methods and grid search within and across domains, evaluating on three industrial datasets. Next, we develop several domain-independent models based on BERT, tailored to better detect security requirements by incorporating data on software weaknesses and vulnerabilities.</div></div><div><h3>Results:</h3><div>Our findings reveal that ensemble and grid search methods prove effective in domain-specific and domain-independent experiments, respectively. However, our custom BERT models showcase domain independence and adaptability. Notably, the CweCveCodeBERT model excels in Precision and F1-score, outperforming existing approaches significantly. It improves F1-score by <span><math><mo>∼</mo></math></span>3% and Precision by <span><math><mo>∼</mo></math></span>14% over the best approach currently in the literature.</div></div><div><h3>Conclusion:</h3><div>BERT-based models, especially with specialized pre-training, show promise for automating security requirement detection. This establishes a foundation for software engineering researchers and practitioners to utilize advanced NLP to improve security in early development phases, fostering the adoption of these state-of-the-art methods in real-world scenarios.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"182 ","pages":"Article 107702"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925000412","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

Early security requirements identification is crucial in software development, facilitating the integration of security measures into IT networks and reducing time and costs throughout software life-cycle.

Objectives:

This paper addresses the limitations of existing methods that leverage Natural Language Processing (NLP) and machine learning techniques for detecting security requirements. These methods often fall short in capturing syntactic and semantic relationships, face challenges in adapting across domains, and rely heavily on extensive domain-specific data. In this paper we focus on identifying the most effective approaches for this task, highlighting both domain-specific and domain-independent strategies.

Method:

Our methodology encompasses two primary streams of investigation. First, we explore shallow machine learning techniques, leveraging word embeddings. We test ensemble methods and grid search within and across domains, evaluating on three industrial datasets. Next, we develop several domain-independent models based on BERT, tailored to better detect security requirements by incorporating data on software weaknesses and vulnerabilities.

Results:

Our findings reveal that ensemble and grid search methods prove effective in domain-specific and domain-independent experiments, respectively. However, our custom BERT models showcase domain independence and adaptability. Notably, the CweCveCodeBERT model excels in Precision and F1-score, outperforming existing approaches significantly. It improves F1-score by

\sim

3% and Precision by

\sim

14% over the best approach currently in the literature.

Conclusion:

BERT-based models, especially with specialized pre-training, show promise for automating security requirement detection. This establishes a foundation for software engineering researchers and practitioners to utilize advanced NLP to improve security in early development phases, fostering the adoption of these state-of-the-art methods in real-world scenarios.

查看原文本刊更多论文

超越安全需求标识中的域依赖关系

背景：早期的安全需求识别在软件开发中是至关重要的，它有助于将安全措施集成到IT网络中，并减少整个软件生命周期的时间和成本。目的：本文解决了利用自然语言处理（NLP）和机器学习技术检测安全需求的现有方法的局限性。这些方法通常在捕获语法和语义关系方面存在不足，在跨领域适应方面面临挑战，并且严重依赖于大量特定于领域的数据。在本文中，我们着重于确定最有效的方法来完成这项任务，突出了特定于领域和独立于领域的策略。方法：我们的方法论包括两个主要的调查流。首先，我们探索浅机器学习技术，利用词嵌入。我们测试了集成方法和网格搜索在域内和跨域，评估了三个工业数据集。接下来，我们开发了几个基于BERT的领域独立模型，通过合并软件弱点和漏洞的数据来更好地检测安全需求。结果：我们的研究结果表明，集成和网格搜索方法分别在特定领域和独立领域的实验中被证明是有效的。然而，我们的自定义BERT模型展示了领域独立性和适应性。值得注意的是，CweCveCodeBERT模型在精度和F1-score方面表现出色，显著优于现有方法。与目前文献中最好的方法相比，它将f1分数提高了~ 3%，精度提高了~ 14%。结论：基于bert的模型，特别是经过专门预训练的模型，有望实现自动化的安全需求检测。这为软件工程研究人员和实践者在早期开发阶段利用先进的NLP来提高安全性建立了基础，促进了在现实世界场景中采用这些最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.