Ransomware detection and family classification using fine-tuned BERT and RoBERTa models

IF 4.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Egyptian Informatics Journal Pub Date : 2025-05-08 DOI:10.1016/j.eij.2025.100645

Amjad Hussain , Ayesha Saadia , Faeiz M. Alserhani

{"title":"Ransomware detection and family classification using fine-tuned BERT and RoBERTa models","authors":"Amjad Hussain , Ayesha Saadia , Faeiz M. Alserhani","doi":"10.1016/j.eij.2025.100645","DOIUrl":null,"url":null,"abstract":"<div><div>Integrating Internet of Things (IoT) technologies in healthcare has revolutionized patient care, enabling real-time monitoring, predictive analytics, and personalized treatments. However, it presents significant challenges that must be addressed to ensure secure and reliable implementation. IoT devices in healthcare, such as remote patient monitors, are often constrained by limited computational power, making them vulnerable to sophisticated cyberattacks, including ransomware. In 2017 the WannaCry ransomware attack disrupted many National Health Service facilities in the United Kingdom and emphasized the critical need for robust cybersecurity measures. The lack of standardization across IoT devices creates interoperability issues and complicates data transfer between medical devices and healthcare systems. This research explores these challenges and proposes a novel approach using hyperparameter-optimized transfer learning-based models, Bidirectional Encoder Representations from Transformers (BERT), and a Robustly Optimized BERT Approach (RoBERTa), to not only detect but also classify ransomware targeting IoT devices by analyzing dynamically executed API call sequences in a sandbox environment. A total of 3300 samples from 10 ransomware families including 300 benign cases are analyzed dynamically in a sandbox environment. The newly created dataset is then preprocessed and fed to the BERT and RoBERTa models for training. The BERT achieved 95.60% accuracy with a minimal loss of 0.1650 while the RoBERTa achieved 94.39% accuracy with 0.1948 loss in classifying ransomware families. These results indicate that the proposed approach is game-changing in the classification of previously unidentified behavioral patterns inside ransomware and enhances the ability to tackle newly developing threats. By leveraging the dynamic analysis with API call sequences in a correct format, and training hyperparameter-optimized transformer learning-based models, the methodology efficiently captures behavioral patterns unique to ransomware. The research provides a scalable framework for integrating advanced detection mechanisms into real-world healthcare IoT systems, enhancing their resilience against cyber threats.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"30 ","pages":"Article 100645"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866525000386","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Integrating Internet of Things (IoT) technologies in healthcare has revolutionized patient care, enabling real-time monitoring, predictive analytics, and personalized treatments. However, it presents significant challenges that must be addressed to ensure secure and reliable implementation. IoT devices in healthcare, such as remote patient monitors, are often constrained by limited computational power, making them vulnerable to sophisticated cyberattacks, including ransomware. In 2017 the WannaCry ransomware attack disrupted many National Health Service facilities in the United Kingdom and emphasized the critical need for robust cybersecurity measures. The lack of standardization across IoT devices creates interoperability issues and complicates data transfer between medical devices and healthcare systems. This research explores these challenges and proposes a novel approach using hyperparameter-optimized transfer learning-based models, Bidirectional Encoder Representations from Transformers (BERT), and a Robustly Optimized BERT Approach (RoBERTa), to not only detect but also classify ransomware targeting IoT devices by analyzing dynamically executed API call sequences in a sandbox environment. A total of 3300 samples from 10 ransomware families including 300 benign cases are analyzed dynamically in a sandbox environment. The newly created dataset is then preprocessed and fed to the BERT and RoBERTa models for training. The BERT achieved 95.60% accuracy with a minimal loss of 0.1650 while the RoBERTa achieved 94.39% accuracy with 0.1948 loss in classifying ransomware families. These results indicate that the proposed approach is game-changing in the classification of previously unidentified behavioral patterns inside ransomware and enhances the ability to tackle newly developing threats. By leveraging the dynamic analysis with API call sequences in a correct format, and training hyperparameter-optimized transformer learning-based models, the methodology efficiently captures behavioral patterns unique to ransomware. The research provides a scalable framework for integrating advanced detection mechanisms into real-world healthcare IoT systems, enhancing their resilience against cyber threats.

查看原文本刊更多论文

使用微调BERT和RoBERTa模型的勒索软件检测和家族分类

将物联网（IoT）技术集成到医疗保健中，彻底改变了患者护理，实现了实时监控、预测分析和个性化治疗。然而，它提出了必须解决的重大挑战，以确保安全可靠的实施。医疗保健领域的物联网设备，如远程患者监测器，往往受到计算能力有限的限制，容易受到包括勒索软件在内的复杂网络攻击。2017年，“想哭”勒索软件攻击扰乱了英国许多国家卫生服务机构，并强调了采取强有力的网络安全措施的迫切需要。物联网设备之间缺乏标准化会产生互操作性问题，并使医疗设备和医疗保健系统之间的数据传输变得复杂。本研究探讨了这些挑战，并提出了一种使用超参数优化的基于迁移学习的模型、来自变形金刚的双向编码器表示（BERT）和鲁棒优化的BERT方法（RoBERTa）的新方法，通过分析沙盒环境中动态执行的API调用序列，不仅可以检测而且可以分类针对物联网设备的勒索软件。在沙盒环境中，对来自10个勒索软件家族的3300个样本进行了动态分析，其中包括300个良性案例。然后对新创建的数据集进行预处理，并将其提供给BERT和RoBERTa模型进行训练。BERT的准确率达到95.60%，最小损失为0.1650，RoBERTa的准确率达到94.39%，损失为0.1948。这些结果表明，所提出的方法在对勒索软件内部先前未识别的行为模式进行分类方面改变了游戏规则，并增强了应对新发展威胁的能力。通过利用正确格式的API调用序列的动态分析，以及训练基于超参数优化的变压器学习模型，该方法有效地捕获了勒索软件特有的行为模式。该研究提供了一个可扩展的框架，用于将先进的检测机制集成到现实世界的医疗保健物联网系统中，增强其抵御网络威胁的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Egyptian Informatics Journal Decision Sciences-Management Science and Operations Research

CiteScore

11.10

自引率

1.90%

发文量

审稿时长

110 days

期刊介绍： The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.