A comparative analysis on using GPT and BERT for automated vulnerability scoring

Intelligent Systems with Applications Pub Date : 2025-04-12 DOI:10.1016/j.iswa.2025.200515

Seyedeh Leili Mirtaheri , Andrea Pugliese , Narges Movahed , Reza Shahbazian

{"title":"A comparative analysis on using GPT and BERT for automated vulnerability scoring","authors":"Seyedeh Leili Mirtaheri , Andrea Pugliese , Narges Movahed , Reza Shahbazian","doi":"10.1016/j.iswa.2025.200515","DOIUrl":null,"url":null,"abstract":"<div><div>Large language models and transformers such as GPT and BERT have shown great improvements in many domains including cybersecurity. A constantly increasing number of vulnerabilities necessitate automated vulnerability scoring systems. Therefore, a deeper understanding of GPT and BERT compatibility with the requirements of the cybersecurity domain seems inevitable for system designers. The BERT model’s family is known to be optimized in understanding the contextual relationships with a bidirectional approach, while the GPT models perform unidirectional processing with generative capabilities. Automated vulnerability scoring systems require both the features to analyze the vulnerability and to augment the vulnerability descriptions. On the other hand, powerful GPT models are often more “resource-intensive in comparison with the BERT family. This paper presents a comprehensive comparison analysis of GPT and BERT in terms of their text classification performance, utilizing the vulnerability description classification task. We outline a thorough theoretical and experimental comparison of the models, regarding their architectures, training objectives, and fine-tuning, as well as their text classification performance. We evaluate these models on the vulnerability description classification task and employ rigorous evaluation metrics to shed light on their relative strengths and shortcomings. We also evaluate the hybrid architectures that benefit from combining GPT and BERT at the same time. Our experiment results show that they can effectively leverage the complementary strengths of both GPT and BERT, namely generative and comprehension, leading to further improvements in classification performance.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"26 ","pages":"Article 200515"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305325000419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models and transformers such as GPT and BERT have shown great improvements in many domains including cybersecurity. A constantly increasing number of vulnerabilities necessitate automated vulnerability scoring systems. Therefore, a deeper understanding of GPT and BERT compatibility with the requirements of the cybersecurity domain seems inevitable for system designers. The BERT model’s family is known to be optimized in understanding the contextual relationships with a bidirectional approach, while the GPT models perform unidirectional processing with generative capabilities. Automated vulnerability scoring systems require both the features to analyze the vulnerability and to augment the vulnerability descriptions. On the other hand, powerful GPT models are often more “resource-intensive in comparison with the BERT family. This paper presents a comprehensive comparison analysis of GPT and BERT in terms of their text classification performance, utilizing the vulnerability description classification task. We outline a thorough theoretical and experimental comparison of the models, regarding their architectures, training objectives, and fine-tuning, as well as their text classification performance. We evaluate these models on the vulnerability description classification task and employ rigorous evaluation metrics to shed light on their relative strengths and shortcomings. We also evaluate the hybrid architectures that benefit from combining GPT and BERT at the same time. Our experiment results show that they can effectively leverage the complementary strengths of both GPT and BERT, namely generative and comprehension, leading to further improvements in classification performance.

Abstract Image

查看原文本刊更多论文

使用GPT和BERT进行漏洞自动评分的比较分析

大型语言模型和转换器，如GPT和BERT，在包括网络安全在内的许多领域都有很大的改进。不断增加的漏洞数量需要自动漏洞评分系统。因此，对于系统设计者来说，更深入地了解GPT和BERT与网络安全领域需求的兼容性似乎是不可避免的。众所周知，BERT模型家族通过双向方法在理解上下文关系方面进行了优化，而GPT模型通过生成能力进行单向处理。自动漏洞评分系统既需要分析漏洞的特性，也需要增加漏洞描述。另一方面，与BERT系列相比，强大的GPT模型通常更“资源密集”。本文利用漏洞描述分类任务，对GPT和BERT的文本分类性能进行了全面的比较分析。我们概述了模型的全面理论和实验比较，包括它们的架构、训练目标、微调以及它们的文本分类性能。我们在漏洞描述分类任务上对这些模型进行了评估，并采用严格的评估指标来揭示它们的相对优势和不足。我们还评估了同时受益于GPT和BERT的混合体系结构。我们的实验结果表明，它们可以有效地利用GPT和BERT的互补优势，即生成和理解，从而进一步提高分类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Intelligent Systems with Applications

CiteScore

5.60

自引率

0.00%

发文量