Cyber security Enhancements with reinforcement learning: A zero-day vulnerabilityu identification perspective.

IF 2.9 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-05-27 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0324595

Muhammad Rehan Naeem, Rashid Amin, Muhammad Farhan, Faisal S Alsubaei, Eesa Alsolami, Muhammad D Zakaria

{"title":"Cyber security Enhancements with reinforcement learning: A zero-day vulnerabilityu identification perspective.","authors":"Muhammad Rehan Naeem, Rashid Amin, Muhammad Farhan, Faisal S Alsubaei, Eesa Alsolami, Muhammad D Zakaria","doi":"10.1371/journal.pone.0324595","DOIUrl":null,"url":null,"abstract":"<p><p>A zero-day vulnerability is a critical security weakness of software or hardware that has not yet been found and, for that reason, neither the vendor nor the users are informed about it. These vulnerabilities may be taken advantage of by malicious people to execute cyber-attacks leading to severe effects on organizations and individuals. Given that nobody knows and is aware of these weaknesses, it becomes challenging to detect and prevent them. For the real-time zero-day vulnerabilities detection, we bring out a novel reinforcement learning (RL) methodology with the help of Deep Q-Networks (DQN). It works by learning the vulnerabilities without any prior knowledge of vulnerabilities, and it is evaluated using rigorous statistical metrics. Traditional methods are surpassed by this one that is able to adjust to changing threats and cope with intricate state spaces while providing scalability to cybersecurity personnel. In this paper, we introduce a new methodology that uses reinforcement learning for zero-day vulnerability detection. Zero-day vulnerabilities are security weaknesses that have never been exposed or published and are considered highly dangerous for systems and networks. Our method exploits reinforcement learning, a sub-type of machine learning which trains agents to make decisions and take actions to maximize an approximation of some underlying cumulative reward signal and discover patterns and features within data related to zero-day discovery. Training of the agent could allow for real-time detection and classification of zero-day vulnerabilities. Our approach will have the potential as a powerful tool of detection and defense against zero-day vulnerabilities and probably brings significant benefits to security experts and researchers in the field of cyber-security. The new method of discovering vulnerabilities that this approach provides has many comparative advantages over the previous approaches. It is applicable to systems with complex behaviour, such as the ones presented throughout this thesis, and can respond to new security threats in real time. Moreover, it does not require any knowledge about vulnerability itself. Because of that, it will discover hidden weak points. In the present paper, we analyzed the statistical evaluation of forecasted values for several parameters in a reinforcement learning environment. We have taken 1000 episodes for training the model and a further 1000 episodes for forecasting using the trained model. We used statistical measures in the evaluation, which showed that the Alpha value was at 0.10, thereby indicating good accuracy in the forecast. Beta was at 0.00, meaning no bias within the forecast. Gamma was also at 0.00, resulting in a very high level of precision within the forecast. MASE was 3.91 and SMAPE was 1.59, meaning that a very minimal percentage error existed within the forecast. The MAE value was at 6.34, while the RMSE was 10.22, meaning a relatively low average difference within actuals and the forecasted values. Results The results demonstrate the effectiveness of reinforcement learning models in solving complex problems and suggest that the model improves in accuracy with more training data added.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 5","pages":"e0324595"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0324595","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

A zero-day vulnerability is a critical security weakness of software or hardware that has not yet been found and, for that reason, neither the vendor nor the users are informed about it. These vulnerabilities may be taken advantage of by malicious people to execute cyber-attacks leading to severe effects on organizations and individuals. Given that nobody knows and is aware of these weaknesses, it becomes challenging to detect and prevent them. For the real-time zero-day vulnerabilities detection, we bring out a novel reinforcement learning (RL) methodology with the help of Deep Q-Networks (DQN). It works by learning the vulnerabilities without any prior knowledge of vulnerabilities, and it is evaluated using rigorous statistical metrics. Traditional methods are surpassed by this one that is able to adjust to changing threats and cope with intricate state spaces while providing scalability to cybersecurity personnel. In this paper, we introduce a new methodology that uses reinforcement learning for zero-day vulnerability detection. Zero-day vulnerabilities are security weaknesses that have never been exposed or published and are considered highly dangerous for systems and networks. Our method exploits reinforcement learning, a sub-type of machine learning which trains agents to make decisions and take actions to maximize an approximation of some underlying cumulative reward signal and discover patterns and features within data related to zero-day discovery. Training of the agent could allow for real-time detection and classification of zero-day vulnerabilities. Our approach will have the potential as a powerful tool of detection and defense against zero-day vulnerabilities and probably brings significant benefits to security experts and researchers in the field of cyber-security. The new method of discovering vulnerabilities that this approach provides has many comparative advantages over the previous approaches. It is applicable to systems with complex behaviour, such as the ones presented throughout this thesis, and can respond to new security threats in real time. Moreover, it does not require any knowledge about vulnerability itself. Because of that, it will discover hidden weak points. In the present paper, we analyzed the statistical evaluation of forecasted values for several parameters in a reinforcement learning environment. We have taken 1000 episodes for training the model and a further 1000 episodes for forecasting using the trained model. We used statistical measures in the evaluation, which showed that the Alpha value was at 0.10, thereby indicating good accuracy in the forecast. Beta was at 0.00, meaning no bias within the forecast. Gamma was also at 0.00, resulting in a very high level of precision within the forecast. MASE was 3.91 and SMAPE was 1.59, meaning that a very minimal percentage error existed within the forecast. The MAE value was at 6.34, while the RMSE was 10.22, meaning a relatively low average difference within actuals and the forecasted values. Results The results demonstrate the effectiveness of reinforcement learning models in solving complex problems and suggest that the model improves in accuracy with more training data added.

查看原文本刊更多论文

强化学习的网络安全增强：零日漏洞识别视角。

零日漏洞是指尚未发现的软件或硬件的关键安全弱点，因此，供应商和用户都没有被告知这一点。这些漏洞可能被恶意人员利用来执行网络攻击，对组织和个人造成严重影响。鉴于没有人知道和意识到这些弱点，检测和预防它们变得具有挑战性。对于实时零日漏洞检测，我们在深度q网络（DQN）的帮助下提出了一种新的强化学习（RL）方法。它的工作原理是在没有任何先前的漏洞知识的情况下学习漏洞，并使用严格的统计指标进行评估。这种方法超越了传统方法，它能够适应不断变化的威胁，应对复杂的状态空间，同时为网络安全人员提供可扩展性。在本文中，我们介绍了一种使用强化学习进行零日漏洞检测的新方法。零日漏洞是从未被暴露或发布的安全弱点，被认为对系统和网络非常危险。我们的方法利用强化学习，这是一种机器学习的子类型，它训练代理做出决策并采取行动，以最大化一些潜在累积奖励信号的近似值，并在与零日发现相关的数据中发现模式和特征。对代理进行培训可以实现对零日漏洞的实时检测和分类。我们的方法将有潜力成为检测和防御零日漏洞的强大工具，并可能为网络安全领域的安全专家和研究人员带来重大利益。与以前的方法相比，这种方法提供的发现漏洞的新方法具有许多相对优势。它适用于具有复杂行为的系统，例如本文中介绍的系统，并且可以实时响应新的安全威胁。此外，它不需要任何关于脆弱性本身的知识。正因为如此，它会发现隐藏的弱点。在本文中，我们分析了在强化学习环境中对几个参数的预测值的统计评估。我们用了1000集来训练模型，再用1000集来使用训练好的模型进行预测。我们在评价中使用了统计方法，结果表明，Alpha值为0.10，表明预测的准确性较好。贝塔值为0.00，这意味着在预测中没有偏差。Gamma值也为0.00，这使得预测的精确度非常高。MASE为3.91，SMAPE为1.59，这意味着在预测中存在非常小的百分比误差。MAE值为6.34，而RMSE为10.22，这意味着实际值和预测值之间的平均差异相对较小。结果验证了强化学习模型在解决复杂问题上的有效性，并表明模型的准确性随着训练数据的增加而提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage