Muhammad Rehan Naeem, Rashid Amin, Muhammad Farhan, Faisal S Alsubaei, Eesa Alsolami, Muhammad D Zakaria
{"title":"Cyber security Enhancements with reinforcement learning: A zero-day vulnerabilityu identification perspective.","authors":"Muhammad Rehan Naeem, Rashid Amin, Muhammad Farhan, Faisal S Alsubaei, Eesa Alsolami, Muhammad D Zakaria","doi":"10.1371/journal.pone.0324595","DOIUrl":null,"url":null,"abstract":"<p><p>A zero-day vulnerability is a critical security weakness of software or hardware that has not yet been found and, for that reason, neither the vendor nor the users are informed about it. These vulnerabilities may be taken advantage of by malicious people to execute cyber-attacks leading to severe effects on organizations and individuals. Given that nobody knows and is aware of these weaknesses, it becomes challenging to detect and prevent them. For the real-time zero-day vulnerabilities detection, we bring out a novel reinforcement learning (RL) methodology with the help of Deep Q-Networks (DQN). It works by learning the vulnerabilities without any prior knowledge of vulnerabilities, and it is evaluated using rigorous statistical metrics. Traditional methods are surpassed by this one that is able to adjust to changing threats and cope with intricate state spaces while providing scalability to cybersecurity personnel. In this paper, we introduce a new methodology that uses reinforcement learning for zero-day vulnerability detection. Zero-day vulnerabilities are security weaknesses that have never been exposed or published and are considered highly dangerous for systems and networks. Our method exploits reinforcement learning, a sub-type of machine learning which trains agents to make decisions and take actions to maximize an approximation of some underlying cumulative reward signal and discover patterns and features within data related to zero-day discovery. Training of the agent could allow for real-time detection and classification of zero-day vulnerabilities. Our approach will have the potential as a powerful tool of detection and defense against zero-day vulnerabilities and probably brings significant benefits to security experts and researchers in the field of cyber-security. The new method of discovering vulnerabilities that this approach provides has many comparative advantages over the previous approaches. It is applicable to systems with complex behaviour, such as the ones presented throughout this thesis, and can respond to new security threats in real time. Moreover, it does not require any knowledge about vulnerability itself. Because of that, it will discover hidden weak points. In the present paper, we analyzed the statistical evaluation of forecasted values for several parameters in a reinforcement learning environment. We have taken 1000 episodes for training the model and a further 1000 episodes for forecasting using the trained model. We used statistical measures in the evaluation, which showed that the Alpha value was at 0.10, thereby indicating good accuracy in the forecast. Beta was at 0.00, meaning no bias within the forecast. Gamma was also at 0.00, resulting in a very high level of precision within the forecast. MASE was 3.91 and SMAPE was 1.59, meaning that a very minimal percentage error existed within the forecast. The MAE value was at 6.34, while the RMSE was 10.22, meaning a relatively low average difference within actuals and the forecasted values. Results The results demonstrate the effectiveness of reinforcement learning models in solving complex problems and suggest that the model improves in accuracy with more training data added.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 5","pages":"e0324595"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0324595","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
A zero-day vulnerability is a critical security weakness of software or hardware that has not yet been found and, for that reason, neither the vendor nor the users are informed about it. These vulnerabilities may be taken advantage of by malicious people to execute cyber-attacks leading to severe effects on organizations and individuals. Given that nobody knows and is aware of these weaknesses, it becomes challenging to detect and prevent them. For the real-time zero-day vulnerabilities detection, we bring out a novel reinforcement learning (RL) methodology with the help of Deep Q-Networks (DQN). It works by learning the vulnerabilities without any prior knowledge of vulnerabilities, and it is evaluated using rigorous statistical metrics. Traditional methods are surpassed by this one that is able to adjust to changing threats and cope with intricate state spaces while providing scalability to cybersecurity personnel. In this paper, we introduce a new methodology that uses reinforcement learning for zero-day vulnerability detection. Zero-day vulnerabilities are security weaknesses that have never been exposed or published and are considered highly dangerous for systems and networks. Our method exploits reinforcement learning, a sub-type of machine learning which trains agents to make decisions and take actions to maximize an approximation of some underlying cumulative reward signal and discover patterns and features within data related to zero-day discovery. Training of the agent could allow for real-time detection and classification of zero-day vulnerabilities. Our approach will have the potential as a powerful tool of detection and defense against zero-day vulnerabilities and probably brings significant benefits to security experts and researchers in the field of cyber-security. The new method of discovering vulnerabilities that this approach provides has many comparative advantages over the previous approaches. It is applicable to systems with complex behaviour, such as the ones presented throughout this thesis, and can respond to new security threats in real time. Moreover, it does not require any knowledge about vulnerability itself. Because of that, it will discover hidden weak points. In the present paper, we analyzed the statistical evaluation of forecasted values for several parameters in a reinforcement learning environment. We have taken 1000 episodes for training the model and a further 1000 episodes for forecasting using the trained model. We used statistical measures in the evaluation, which showed that the Alpha value was at 0.10, thereby indicating good accuracy in the forecast. Beta was at 0.00, meaning no bias within the forecast. Gamma was also at 0.00, resulting in a very high level of precision within the forecast. MASE was 3.91 and SMAPE was 1.59, meaning that a very minimal percentage error existed within the forecast. The MAE value was at 6.34, while the RMSE was 10.22, meaning a relatively low average difference within actuals and the forecasted values. Results The results demonstrate the effectiveness of reinforcement learning models in solving complex problems and suggest that the model improves in accuracy with more training data added.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage