Functions-based CFG Embedding for Malware Homology Analysis

2019 26th International Conference on Telecommunications (ICT) Pub Date : 2019-04-01 DOI:10.1109/ICT.2019.8798769

Jieran Liu, Yuan Shen, Hanbing Yan

{"title":"Functions-based CFG Embedding for Malware Homology Analysis","authors":"Jieran Liu, Yuan Shen, Hanbing Yan","doi":"10.1109/ICT.2019.8798769","DOIUrl":null,"url":null,"abstract":"Malware homology analysis aims at detecting whether different malicious code originates from the same set of malicious code or is written by the same author or team, and whether it has intrinsic relevance and similarity. At the same time, the homology analysis of malicious code is also an important part of studying the groups behind different APT (Advanced Persistent Threat) attacks. At present, homology identification still relies on manual analysis and security experts' experience in the anti-malware industry. In addition, research on large-scale malicious code automated homology analysis is still insufficient. The method proposed in this paper is to solve the problem of large-scale malicious code homology automatic analysis, and hope to provide auxiliary information for discovering the group behind the APT attack. In this paper, we collected samples of different APT groups from public threat intelligence and proposed a novel approach to classify these samples into different APT groups to further analyze the homology of malware. We combined the CFG (Control Flow Graph) of the malicious code function and the disassembled code of the stripped malware to generate the embedding, i.e., a numeric vector, which formed a function feature database of the APT group, and presented a neural network model used for APT group classification. We have implemented our approach in a prototype system called MCrab. Our extensive evaluation showed that MCrab could produce high accuracy results, with few to no false positives. Our research also showed that deep learning can be successfully applied to malware homology analysis.","PeriodicalId":127412,"journal":{"name":"2019 26th International Conference on Telecommunications (ICT)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 26th International Conference on Telecommunications (ICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICT.2019.8798769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Malware homology analysis aims at detecting whether different malicious code originates from the same set of malicious code or is written by the same author or team, and whether it has intrinsic relevance and similarity. At the same time, the homology analysis of malicious code is also an important part of studying the groups behind different APT (Advanced Persistent Threat) attacks. At present, homology identification still relies on manual analysis and security experts' experience in the anti-malware industry. In addition, research on large-scale malicious code automated homology analysis is still insufficient. The method proposed in this paper is to solve the problem of large-scale malicious code homology automatic analysis, and hope to provide auxiliary information for discovering the group behind the APT attack. In this paper, we collected samples of different APT groups from public threat intelligence and proposed a novel approach to classify these samples into different APT groups to further analyze the homology of malware. We combined the CFG (Control Flow Graph) of the malicious code function and the disassembled code of the stripped malware to generate the embedding, i.e., a numeric vector, which formed a function feature database of the APT group, and presented a neural network model used for APT group classification. We have implemented our approach in a prototype system called MCrab. Our extensive evaluation showed that MCrab could produce high accuracy results, with few to no false positives. Our research also showed that deep learning can be successfully applied to malware homology analysis.

查看原文本刊更多论文

基于函数的CFG嵌入恶意软件同源性分析

恶意软件同源性分析的目的是检测不同的恶意代码是否来自同一组恶意代码，是否由同一作者或团队编写，是否具有内在的相关性和相似性。同时，恶意代码的同源性分析也是研究不同APT(高级持续威胁)攻击背后组织的重要组成部分。目前，在反恶意软件行业，同源性识别仍然依赖于人工分析和安全专家的经验。此外，对大规模恶意代码自动同源性分析的研究仍然不足。本文提出的方法是为了解决大规模恶意代码同源性自动分析的问题，希望为发现APT攻击背后的组织提供辅助信息。在本文中，我们从公共威胁情报中收集了不同APT组的样本，并提出了一种新的方法将这些样本分类到不同的APT组中，以进一步分析恶意软件的同源性。结合恶意代码函数的CFG (Control Flow Graph)和被剥离的恶意软件的反汇编代码生成嵌入，即一个数字向量，形成APT组的函数特征库，并提出用于APT组分类的神经网络模型。我们已经在一个叫做MCrab的原型系统中实现了我们的方法。我们的广泛评估表明，MCrab可以产生高精度的结果，几乎没有假阳性。我们的研究还表明，深度学习可以成功地应用于恶意软件的同源性分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 26th International Conference on Telecommunications (ICT)

自引率

0.00%

发文量