Transformer-Based Vulnerability Detection in IoT Firmware Binaries Using Opcode Sequences

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2025-07-15 DOI:10.1109/ACCESS.2025.3588950

M. Nandish;Jalesh Kumar;H. G. Mohan;M. V. Manoj Kumar

{"title":"Transformer-Based Vulnerability Detection in IoT Firmware Binaries Using Opcode Sequences","authors":"M. Nandish;Jalesh Kumar;H. G. Mohan;M. V. Manoj Kumar","doi":"10.1109/ACCESS.2025.3588950","DOIUrl":null,"url":null,"abstract":"Firmware security is critical for maintaining the integrity of embedded systems. However, detecting vulnerabilities in firmware binaries is a challenging task. This is due to the absence of source code, the inherent complexity of binary structures, the diversity of hardware architecture, and the difficulty of extracting deep contextual representations from binaries. In the proposed approach, the Decoding-enhanced BERT with Disentangled Attention (DeBERTa), a novel transformer-based model is used to detect vulnerabilities in firmware binaries. Initially, firmware binaries are disassembled to extract opcode sequences, which are then tokenized and encoded as inputs to the proposed DeBERTa model. The model processes instruction opcode sequences and generates meaningful embeddings, which are utilized for classification tasks. The classifiers used in the proposed approach are Random Forest, Multi-Layer Perceptron, and GAN-based classifier, which operate on the DeBERTa-generated embeddings. The proposed model learns deep contextual representations of firmware code, effectively capturing intricate syntactic and semantic relationships. The evaluation is conducted on IoT firmware binaries collected from real-world IoT projects, reflecting practical and diverse vulnerability scenarios. Experimental results demonstrate that the proposed DeBERTa-based model achieves 97% accuracy, 97% recall, and 94.6% F1-score, outperforming conventional embedding techniques. The experimental findings demonstrate that the opcode sequence feature effectively and reliably detects different types of vulnerable and benign IoT samples.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"124250-124263"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11080410","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11080410/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Firmware security is critical for maintaining the integrity of embedded systems. However, detecting vulnerabilities in firmware binaries is a challenging task. This is due to the absence of source code, the inherent complexity of binary structures, the diversity of hardware architecture, and the difficulty of extracting deep contextual representations from binaries. In the proposed approach, the Decoding-enhanced BERT with Disentangled Attention (DeBERTa), a novel transformer-based model is used to detect vulnerabilities in firmware binaries. Initially, firmware binaries are disassembled to extract opcode sequences, which are then tokenized and encoded as inputs to the proposed DeBERTa model. The model processes instruction opcode sequences and generates meaningful embeddings, which are utilized for classification tasks. The classifiers used in the proposed approach are Random Forest, Multi-Layer Perceptron, and GAN-based classifier, which operate on the DeBERTa-generated embeddings. The proposed model learns deep contextual representations of firmware code, effectively capturing intricate syntactic and semantic relationships. The evaluation is conducted on IoT firmware binaries collected from real-world IoT projects, reflecting practical and diverse vulnerability scenarios. Experimental results demonstrate that the proposed DeBERTa-based model achieves 97% accuracy, 97% recall, and 94.6% F1-score, outperforming conventional embedding techniques. The experimental findings demonstrate that the opcode sequence feature effectively and reliably detects different types of vulnerable and benign IoT samples.

查看原文本刊更多论文

使用操作码序列的物联网固件二进制文件中基于变压器的漏洞检测

固件安全性对于维护嵌入式系统的完整性至关重要。然而，检测固件二进制文件中的漏洞是一项具有挑战性的任务。这是由于缺乏源代码，二进制结构固有的复杂性，硬件架构的多样性，以及从二进制文件中提取深度上下文表示的难度。在该方法中，采用基于变压器的解码增强BERT解纠缠注意（DeBERTa）模型来检测固件二进制文件中的漏洞。最初，固件二进制文件被反汇编以提取操作码序列，然后将其标记和编码为建议的DeBERTa模型的输入。该模型处理指令操作码序列并生成有意义的嵌入，用于分类任务。该方法中使用的分类器是随机森林、多层感知器和基于gan的分类器，它们在deberta生成的嵌入上运行。提出的模型学习固件代码的深度上下文表示，有效地捕获复杂的语法和语义关系。评估对象为从真实IoT项目中收集的IoT固件二进制文件，反映了实际和多样化的漏洞场景。实验结果表明，基于deberta的模型准确率为97%，召回率为97%，f1得分为94.6%，优于传统的嵌入技术。实验结果表明，该操作码序列特征能够有效、可靠地检测出不同类型的易受攻击和良性物联网样本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.