{"title":"A comparative analysis on using GPT and BERT for automated vulnerability scoring","authors":"Seyedeh Leili Mirtaheri , Andrea Pugliese , Narges Movahed , Reza Shahbazian","doi":"10.1016/j.iswa.2025.200515","DOIUrl":null,"url":null,"abstract":"<div><div>Large language models and transformers such as GPT and BERT have shown great improvements in many domains including cybersecurity. A constantly increasing number of vulnerabilities necessitate automated vulnerability scoring systems. Therefore, a deeper understanding of GPT and BERT compatibility with the requirements of the cybersecurity domain seems inevitable for system designers. The BERT model’s family is known to be optimized in understanding the contextual relationships with a bidirectional approach, while the GPT models perform unidirectional processing with generative capabilities. Automated vulnerability scoring systems require both the features to analyze the vulnerability and to augment the vulnerability descriptions. On the other hand, powerful GPT models are often more “resource-intensive in comparison with the BERT family. This paper presents a comprehensive comparison analysis of GPT and BERT in terms of their text classification performance, utilizing the vulnerability description classification task. We outline a thorough theoretical and experimental comparison of the models, regarding their architectures, training objectives, and fine-tuning, as well as their text classification performance. We evaluate these models on the vulnerability description classification task and employ rigorous evaluation metrics to shed light on their relative strengths and shortcomings. We also evaluate the hybrid architectures that benefit from combining GPT and BERT at the same time. Our experiment results show that they can effectively leverage the complementary strengths of both GPT and BERT, namely generative and comprehension, leading to further improvements in classification performance.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"26 ","pages":"Article 200515"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305325000419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large language models and transformers such as GPT and BERT have shown great improvements in many domains including cybersecurity. A constantly increasing number of vulnerabilities necessitate automated vulnerability scoring systems. Therefore, a deeper understanding of GPT and BERT compatibility with the requirements of the cybersecurity domain seems inevitable for system designers. The BERT model’s family is known to be optimized in understanding the contextual relationships with a bidirectional approach, while the GPT models perform unidirectional processing with generative capabilities. Automated vulnerability scoring systems require both the features to analyze the vulnerability and to augment the vulnerability descriptions. On the other hand, powerful GPT models are often more “resource-intensive in comparison with the BERT family. This paper presents a comprehensive comparison analysis of GPT and BERT in terms of their text classification performance, utilizing the vulnerability description classification task. We outline a thorough theoretical and experimental comparison of the models, regarding their architectures, training objectives, and fine-tuning, as well as their text classification performance. We evaluate these models on the vulnerability description classification task and employ rigorous evaluation metrics to shed light on their relative strengths and shortcomings. We also evaluate the hybrid architectures that benefit from combining GPT and BERT at the same time. Our experiment results show that they can effectively leverage the complementary strengths of both GPT and BERT, namely generative and comprehension, leading to further improvements in classification performance.