MalGEA: A malware analysis framework via matrix factorization based node embedding and graph external attention

IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS
Array Pub Date : 2025-08-19 DOI:10.1016/j.array.2025.100493
Ruisheng Li , Qilong Zhang , Huimin Shen
{"title":"MalGEA: A malware analysis framework via matrix factorization based node embedding and graph external attention","authors":"Ruisheng Li ,&nbsp;Qilong Zhang ,&nbsp;Huimin Shen","doi":"10.1016/j.array.2025.100493","DOIUrl":null,"url":null,"abstract":"<div><div>As one of the major threats in cybersecurity, malware has been growing continuously and steadily. In recent years, researchers have proposed a number of graph representation learning based malware detection methods by leveraging the intrinsic topological features of malware, which has led to considerable development in this area. However, these existing malware studies still have two major limitations. (1) The complex topological structures of malware graphs often result in high computational overhead during feature extraction and processing. (2) Most existing approaches rely on conventional graph neural networks that are not specifically designed for malware classification tasks, leading to suboptimal performance, especially when dealing with minority class samples. To address these problems, we propose MalGEA, a novel malware detection and classification framework based on matrix factorization and graph external attention mechanisms. First, MalGEA extracts function call information from malware and constructs corresponding function call graphs. These graphs are then processed using sparse matrix factorization and spectral propagation to efficiently generate node embeddings. Finally, we employ an graph external attention network to model inter-graph relationships and perform malware detection and classification. To evaluate our approach, we utilized a benchmark malware dataset which contains 6 categories and 35 families, including 50k benign and 50k malicious samples. Experimental results demonstrate that our method significantly outperforms existing node embedding approaches in terms of computational efficiency, while also achieving high accuracy in malware detection and family classification tasks.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100493"},"PeriodicalIF":4.5000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625001201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

As one of the major threats in cybersecurity, malware has been growing continuously and steadily. In recent years, researchers have proposed a number of graph representation learning based malware detection methods by leveraging the intrinsic topological features of malware, which has led to considerable development in this area. However, these existing malware studies still have two major limitations. (1) The complex topological structures of malware graphs often result in high computational overhead during feature extraction and processing. (2) Most existing approaches rely on conventional graph neural networks that are not specifically designed for malware classification tasks, leading to suboptimal performance, especially when dealing with minority class samples. To address these problems, we propose MalGEA, a novel malware detection and classification framework based on matrix factorization and graph external attention mechanisms. First, MalGEA extracts function call information from malware and constructs corresponding function call graphs. These graphs are then processed using sparse matrix factorization and spectral propagation to efficiently generate node embeddings. Finally, we employ an graph external attention network to model inter-graph relationships and perform malware detection and classification. To evaluate our approach, we utilized a benchmark malware dataset which contains 6 categories and 35 families, including 50k benign and 50k malicious samples. Experimental results demonstrate that our method significantly outperforms existing node embedding approaches in terms of computational efficiency, while also achieving high accuracy in malware detection and family classification tasks.
MalGEA:一个基于矩阵分解的节点嵌入和图外部关注的恶意软件分析框架
恶意软件作为网络安全的主要威胁之一,一直在持续稳定的增长。近年来,研究人员利用恶意软件固有的拓扑特征,提出了许多基于图表示学习的恶意软件检测方法,并在该领域取得了长足的发展。然而,这些现有的恶意软件研究仍然有两个主要的局限性。(1)恶意软件图的复杂拓扑结构往往导致特征提取和处理的计算开销较大。(2)大多数现有方法依赖于传统的图神经网络,这些网络不是专门为恶意软件分类任务设计的,导致性能不佳,特别是在处理少数类样本时。为了解决这些问题,我们提出了一种基于矩阵分解和图外部注意机制的恶意软件检测和分类框架MalGEA。MalGEA首先从恶意软件中提取函数调用信息,构造相应的函数调用图。然后使用稀疏矩阵分解和光谱传播对这些图进行处理,以有效地生成节点嵌入。最后,采用图外部关注网络对图间关系进行建模,并对恶意软件进行检测和分类。为了评估我们的方法,我们使用了一个包含6个类别和35个家族的基准恶意软件数据集,其中包括5万个良性和5万个恶意样本。实验结果表明,该方法在计算效率上明显优于现有的节点嵌入方法,同时在恶意软件检测和家族分类任务中也达到了较高的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Array
Array Computer Science-General Computer Science
CiteScore
4.40
自引率
0.00%
发文量
93
审稿时长
45 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信