Aissa Ben Yahya, Hicham El Akhal, El Mehdi Ismaili Alaoui, Abdelbaki El Belrhiti El Alaoui
{"title":"Bayes-based word weighting for enhanced vulnerability classification in critical infrastructure systems","authors":"Aissa Ben Yahya, Hicham El Akhal, El Mehdi Ismaili Alaoui, Abdelbaki El Belrhiti El Alaoui","doi":"10.1016/j.cose.2025.104451","DOIUrl":null,"url":null,"abstract":"<div><div>The increasing number of vulnerabilities in embedded devices poses a significant threat to the critical infrastructure security where these devices are used. While deep learning approaches have advanced software vulnerability classification, they exhibit critical limitations regarding word weighting. Conventional methods like term frequency–inverse document frequency (TF–IDF) prioritize global term distributions but overlook intra-class distinctions. While improved variants of this technique have been proposed, they often fail to consider that a word’s importance can vary across categories and struggle to prioritize rare but distinctive words adequately. Additionally, high inter-class semantic overlap and terminological ambiguity in vulnerability descriptions hinder model performance by failing to separate intra-class keywords From background noise. to address these gaps, we propose a novel vulnerability classification and word vector weighting approach based on bayes theorem. our method dynamically adjusts term relevance by calculating posterior probabilities of word-category associations, emphasizing rare tokens with high intra-class specificity. we validate the approach on four test datasets derived from databases such as the national vulnerability database (NVD) and the chinese vulnerability database (CNNVD). rigorous ablation and comparative studies demonstrate that bayes-based word weighting outperformed other methods by achieving a performance of 97.63% accuracy, and 97.60% F1-score on the most challenging test data. all our models and code to produce our results are open-sourced.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"154 ","pages":"Article 104451"},"PeriodicalIF":4.8000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825001403","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The increasing number of vulnerabilities in embedded devices poses a significant threat to the critical infrastructure security where these devices are used. While deep learning approaches have advanced software vulnerability classification, they exhibit critical limitations regarding word weighting. Conventional methods like term frequency–inverse document frequency (TF–IDF) prioritize global term distributions but overlook intra-class distinctions. While improved variants of this technique have been proposed, they often fail to consider that a word’s importance can vary across categories and struggle to prioritize rare but distinctive words adequately. Additionally, high inter-class semantic overlap and terminological ambiguity in vulnerability descriptions hinder model performance by failing to separate intra-class keywords From background noise. to address these gaps, we propose a novel vulnerability classification and word vector weighting approach based on bayes theorem. our method dynamically adjusts term relevance by calculating posterior probabilities of word-category associations, emphasizing rare tokens with high intra-class specificity. we validate the approach on four test datasets derived from databases such as the national vulnerability database (NVD) and the chinese vulnerability database (CNNVD). rigorous ablation and comparative studies demonstrate that bayes-based word weighting outperformed other methods by achieving a performance of 97.63% accuracy, and 97.60% F1-score on the most challenging test data. all our models and code to produce our results are open-sourced.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.