From attack descriptions to vulnerabilities: A sentence transformer-based approach

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2025-09-09 DOI:10.1016/j.jss.2025.112615

Refat Othman , Diaeddin Rimawi , Bruno Rossi , Barbara Russo

{"title":"From attack descriptions to vulnerabilities: A sentence transformer-based approach","authors":"Refat Othman , Diaeddin Rimawi , Bruno Rossi , Barbara Russo","doi":"10.1016/j.jss.2025.112615","DOIUrl":null,"url":null,"abstract":"<div><div>In the domain of security, vulnerabilities frequently remain undetected even after their exploitation. In this work, vulnerabilities refer to publicly disclosed flaws documented in Common Vulnerabilities and Exposures (CVE) reports. Establishing a connection between attacks and vulnerabilities is essential for enabling timely incident response, as it provides defenders with immediate, actionable insights. However, manually mapping attacks to CVEs is infeasible, thereby motivating the need for automation. This paper evaluates 14 state-of-the-art (SOTA) sentence transformers for automatically identifying vulnerabilities from textual descriptions of attacks. Our results demonstrate that the <span>multi-qa-mpnet-base-dot-v1 (MMPNet)</span> model achieves superior classification performance when using attack Technique descriptions, with an F<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span>-score of 89.0, precision of 84.0, and recall of 94.7. Furthermore, it was observed that, on average, 56% of the vulnerabilities identified by the <span>MMPNet</span> model are also represented within the CVE repository in conjunction with an attack, while 61% of the vulnerabilities detected by the model correspond to those cataloged in the CVE repository. A manual inspection of the results revealed the existence of 275 predicted links that were not documented in the MITRE repositories. Consequently, the automation of linking attack techniques to vulnerabilities not only enhances the detection and response capabilities related to software security incidents but also diminishes the duration during which vulnerabilities remain exploitable, thereby contributing to the development of more secure systems.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"231 ","pages":"Article 112615"},"PeriodicalIF":4.1000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225002845","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

In the domain of security, vulnerabilities frequently remain undetected even after their exploitation. In this work, vulnerabilities refer to publicly disclosed flaws documented in Common Vulnerabilities and Exposures (CVE) reports. Establishing a connection between attacks and vulnerabilities is essential for enabling timely incident response, as it provides defenders with immediate, actionable insights. However, manually mapping attacks to CVEs is infeasible, thereby motivating the need for automation. This paper evaluates 14 state-of-the-art (SOTA) sentence transformers for automatically identifying vulnerabilities from textual descriptions of attacks. Our results demonstrate that the multi-qa-mpnet-base-dot-v1 (MMPNet) model achieves superior classification performance when using attack Technique descriptions, with an F

_{1}

-score of 89.0, precision of 84.0, and recall of 94.7. Furthermore, it was observed that, on average, 56% of the vulnerabilities identified by the MMPNet model are also represented within the CVE repository in conjunction with an attack, while 61% of the vulnerabilities detected by the model correspond to those cataloged in the CVE repository. A manual inspection of the results revealed the existence of 275 predicted links that were not documented in the MITRE repositories. Consequently, the automation of linking attack techniques to vulnerabilities not only enhances the detection and response capabilities related to software security incidents but also diminishes the duration during which vulnerabilities remain exploitable, thereby contributing to the development of more secure systems.

查看原文本刊更多论文

从攻击描述到漏洞：基于句子转换器的方法

在安全领域，漏洞即使在被利用后也经常未被发现。在这项工作中，漏洞指的是公共漏洞和暴露（Common vulnerabilities and Exposures， CVE）报告中公开披露的缺陷。在攻击和漏洞之间建立联系对于实现及时的事件响应至关重要，因为它为防御者提供了即时的、可操作的见解。然而，手动将攻击映射到cve是不可行的，因此激发了对自动化的需求。本文评估了14种最先进的（SOTA）句子转换器，用于从攻击的文本描述中自动识别漏洞。结果表明，MMPNet （multi-qa-mpnet-base-dot-v1）模型在使用攻击技术描述时取得了优异的分类性能，f1得分为89.0，准确率为84.0，召回率为94.7。此外，我们观察到，平均而言，MMPNet模型识别的56%的漏洞也与攻击一起在CVE存储库中表示，而该模型检测到的61%的漏洞与CVE存储库中编录的漏洞相对应。对结果的手工检查显示，有275个预测链接没有记录在MITRE存储库中。因此，将攻击技术与漏洞联系起来的自动化不仅增强了与软件安全事件相关的检测和响应能力，而且减少了漏洞被利用的持续时间，从而有助于开发更安全的系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.