Multimodal Fusion for Android Malware Detection Based on Large Pre-Trained Models

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2025-04-03 DOI:10.1109/TSE.2025.3557577

Xun Li;Lei Liu;Yuzhou Liu;Yu Zhao;Peng Zhang;Huaxiao Liu

{"title":"Multimodal Fusion for Android Malware Detection Based on Large Pre-Trained Models","authors":"Xun Li;Lei Liu;Yuzhou Liu;Yu Zhao;Peng Zhang;Huaxiao Liu","doi":"10.1109/TSE.2025.3557577","DOIUrl":null,"url":null,"abstract":"Malware detection is a critical issue in software engineering as it directly threatens user information security. Existing approaches often focus on individual modality (either source code or binary code) for the detection, but it ignores to effectively exploit the complementary information between them. This limits the detection performance, especially in complex and evasive malware scenarios. In this paper, we take Android applications written in Java as objects, and provide a novel fine-grained multimodal fusion method with large pre-trained models to combine the features from source and binary codes for the malware detection. For the source code modality, we employ the graphical user interface (GUI) as a framework to segment the source code into snippets, and use a pre-trained programming language model to extract feature representations. For the binary code modality, we convert binary code into grayscale images and fine-tune a pre-trained vision model to extract features indirectly. We then implement cross-modal attention and devise a contrastive loss to align features across modalities, supplementing this with supervised classification loss to refine the multimodal fusion process specifically for malware detection. Our experiments, conducted using the Data-MD and Data-MC benchmarks, demonstrate that our approach achieves a precision of 0.977 and a recall of 0.984 in detecting malware. This underscores the advantages of using large pre-trained models for feature representation and the fusion of information across different modalities for effective malware detection.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1569-1590"},"PeriodicalIF":5.6000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10948385/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Malware detection is a critical issue in software engineering as it directly threatens user information security. Existing approaches often focus on individual modality (either source code or binary code) for the detection, but it ignores to effectively exploit the complementary information between them. This limits the detection performance, especially in complex and evasive malware scenarios. In this paper, we take Android applications written in Java as objects, and provide a novel fine-grained multimodal fusion method with large pre-trained models to combine the features from source and binary codes for the malware detection. For the source code modality, we employ the graphical user interface (GUI) as a framework to segment the source code into snippets, and use a pre-trained programming language model to extract feature representations. For the binary code modality, we convert binary code into grayscale images and fine-tune a pre-trained vision model to extract features indirectly. We then implement cross-modal attention and devise a contrastive loss to align features across modalities, supplementing this with supervised classification loss to refine the multimodal fusion process specifically for malware detection. Our experiments, conducted using the Data-MD and Data-MC benchmarks, demonstrate that our approach achieves a precision of 0.977 and a recall of 0.984 in detecting malware. This underscores the advantages of using large pre-trained models for feature representation and the fusion of information across different modalities for effective malware detection.

查看原文本刊更多论文

基于大型预训练模型的Android恶意软件检测多模态融合

恶意软件检测是软件工程中的一个关键问题，它直接威胁到用户的信息安全。现有的方法往往侧重于单个模态（源代码或二进制代码）的检测，而忽略了有效地利用它们之间的互补信息。这限制了检测性能，特别是在复杂和规避的恶意软件场景中。本文以Java编写的Android应用程序为对象，提出了一种新颖的细粒度多模态融合方法，利用大型预训练模型将源代码和二进制代码的特征结合起来进行恶意软件检测。对于源代码模式，我们采用图形用户界面（GUI）作为框架将源代码分割成片段，并使用预训练的编程语言模型提取特征表示。对于二值码模式，我们将二值码转换为灰度图像，并对预训练的视觉模型进行微调，间接提取特征。然后，我们实现了跨模态注意，并设计了一个对比损失来对齐跨模态的特征，并辅以监督分类损失来完善多模态融合过程，特别是针对恶意软件检测。我们使用Data-MD和Data-MC基准测试进行的实验表明，我们的方法在检测恶意软件方面达到了0.977的精度和0.984的召回率。这强调了使用大型预训练模型进行特征表示和跨不同模式的信息融合以进行有效恶意软件检测的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.