Peng Wu , Mohan Gao , Fuhui Sun , Xiaoyan Wang , Li Pan
{"title":"Multi-perspective API call sequence behavior analysis and fusion for malware classification","authors":"Peng Wu , Mohan Gao , Fuhui Sun , Xiaoyan Wang , Li Pan","doi":"10.1016/j.cose.2024.104177","DOIUrl":null,"url":null,"abstract":"<div><div>The growing variety of malicious software, i.e., malware, has caused great damage and economic loss to computer systems. The API call sequence of malware reflects its dynamic behavior during execution, which is difficult to disguise. Therefore, API call sequence can serve as a robust feature for the detection and classification of malware. The statistical analysis presented in this paper reveals two distinct characteristics within the API call sequences of different malware: (1) the API existence feature caused by frequent calls to the APIs with some special functions, and (2) the API transition feature caused by frequent calls to some special API subsequence patterns. Based on these two characteristics, this paper proposes MINES, a Multi-perspective apI call sequeNce bEhavior fuSion malware classification Method. Specifically, the API existence features from different perspectives are described by two graphs that model diverse rich and complex existence relationships between APIs, and we adopt the graph contrastive learning framework to extract the consistent shared API existence feature from two graphs. Similarly, the API transition features of different hops are described by the multi-order transition probability matrices. By treat each order as a channel, a CNN-based contrastive learning framework is adopted to extract the API transition feature. Finally, the two kinds of extracted features are fused to classify malware. Experiments on five datasets demonstrate the superiority of MINES over various state-of-the-arts by a large margin.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104177"},"PeriodicalIF":4.8000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004826","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The growing variety of malicious software, i.e., malware, has caused great damage and economic loss to computer systems. The API call sequence of malware reflects its dynamic behavior during execution, which is difficult to disguise. Therefore, API call sequence can serve as a robust feature for the detection and classification of malware. The statistical analysis presented in this paper reveals two distinct characteristics within the API call sequences of different malware: (1) the API existence feature caused by frequent calls to the APIs with some special functions, and (2) the API transition feature caused by frequent calls to some special API subsequence patterns. Based on these two characteristics, this paper proposes MINES, a Multi-perspective apI call sequeNce bEhavior fuSion malware classification Method. Specifically, the API existence features from different perspectives are described by two graphs that model diverse rich and complex existence relationships between APIs, and we adopt the graph contrastive learning framework to extract the consistent shared API existence feature from two graphs. Similarly, the API transition features of different hops are described by the multi-order transition probability matrices. By treat each order as a channel, a CNN-based contrastive learning framework is adopted to extract the API transition feature. Finally, the two kinds of extracted features are fused to classify malware. Experiments on five datasets demonstrate the superiority of MINES over various state-of-the-arts by a large margin.
恶意软件(即恶意软件)的种类越来越多,给计算机系统造成了巨大的破坏和经济损失。恶意软件的 API 调用序列反映了其在执行过程中的动态行为,很难伪装。因此,API 调用序列可以作为检测和分类恶意软件的有力特征。本文的统计分析揭示了不同恶意软件的API调用序列的两个明显特征:(1)频繁调用具有某些特殊功能的API所导致的API存在特征;(2)频繁调用某些特殊API子序列模式所导致的API转换特征。基于这两个特征,本文提出了多角度API调用捕获行为分析恶意软件分类方法(MINES)。具体来说,不同视角的API存在特征由两个图来描述,这两个图模拟了API之间多样丰富复杂的存在关系,我们采用图对比学习框架从两个图中提取一致共享的API存在特征。同样,不同跳数的 API 转换特征也由多阶转换概率矩阵来描述。通过将每个阶作为一个通道,采用基于 CNN 的对比学习框架来提取 API 过渡特征。最后,融合两种提取的特征对恶意软件进行分类。在五个数据集上进行的实验表明,MINES 比各种先进技术都要优越得多。
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.