GLDOC: detection of implicitly malicious MS-Office documents using graph convolutional networks

IF 3.7 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Cybersecurity Pub Date : 2024-07-25 DOI:10.1186/s42400-024-00243-7

Wenbo Wang, Peng Yi, Taotao Kou, Weitao Han, Chengyu Wang

{"title":"GLDOC: detection of implicitly malicious MS-Office documents using graph convolutional networks","authors":"Wenbo Wang, Peng Yi, Taotao Kou, Weitao Han, Chengyu Wang","doi":"10.1186/s42400-024-00243-7","DOIUrl":null,"url":null,"abstract":"<p>Nowadays, the malicious MS-Office document has already become one of the most effective attacking vectors in APT attacks. Though many protection mechanisms are provided, they have been proved easy to bypass, and the existed detection methods show poor performance when facing malicious documents with unknown vulnerabilities or with few malicious behaviors. In this paper, we first introduce the definition of im-documents, to describe those vulnerable documents which show implicitly malicious behaviors and escape most of public antivirus engines. Then we present GLDOC—a GCN based framework that is aimed at effectively detecting im-documents with dynamic analysis, and improving the possible blind spots of past detection methods. Besides the system call which is the only focus in most researches, we capture all dynamic behaviors in sandbox, take the process tree into consideration and reconstruct both of them into graphs. Using each line to learn each graph, GLDOC trains a 2-channel network as well as a classifier to formulate the malicious document detection problem into a graph learning and classification problem. Experiments show that GLDOC has a comprehensive balance of accuracy rate and false alarm rate − 95.33% and 4.33% respectively, outperforming other detection methods. When further testing in a simulated 5-day attacking scenario, our proposed framework still maintains a stable and high detection accuracy on the unknown vulnerabilities.</p>","PeriodicalId":36402,"journal":{"name":"Cybersecurity","volume":"50 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cybersecurity","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s42400-024-00243-7","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Nowadays, the malicious MS-Office document has already become one of the most effective attacking vectors in APT attacks. Though many protection mechanisms are provided, they have been proved easy to bypass, and the existed detection methods show poor performance when facing malicious documents with unknown vulnerabilities or with few malicious behaviors. In this paper, we first introduce the definition of im-documents, to describe those vulnerable documents which show implicitly malicious behaviors and escape most of public antivirus engines. Then we present GLDOC—a GCN based framework that is aimed at effectively detecting im-documents with dynamic analysis, and improving the possible blind spots of past detection methods. Besides the system call which is the only focus in most researches, we capture all dynamic behaviors in sandbox, take the process tree into consideration and reconstruct both of them into graphs. Using each line to learn each graph, GLDOC trains a 2-channel network as well as a classifier to formulate the malicious document detection problem into a graph learning and classification problem. Experiments show that GLDOC has a comprehensive balance of accuracy rate and false alarm rate − 95.33% and 4.33% respectively, outperforming other detection methods. When further testing in a simulated 5-day attacking scenario, our proposed framework still maintains a stable and high detection accuracy on the unknown vulnerabilities.

Abstract Image

查看原文本刊更多论文

GLDOC：利用图卷积网络检测隐含恶意的 MS-Office 文档

如今，恶意 MS-Office 文档已成为 APT 攻击中最有效的攻击载体之一。尽管提供了许多保护机制，但事实证明这些机制很容易被绕过，而且现有的检测方法在面对漏洞未知或恶意行为较少的恶意文档时表现不佳。在本文中，我们首先介绍了 "im-documents "的定义，以描述那些隐含恶意行为并能躲过大多数公共杀毒引擎的易受攻击文档。然后，我们介绍了 GLDOC--一个基于 GCN 的框架，旨在通过动态分析有效检测 im-文档，并改善以往检测方法可能存在的盲点。除了大多数研究中唯一关注的系统调用外，我们还捕获了沙箱中的所有动态行为，并将进程树考虑在内，将二者重构为图。GLDOC 利用每一行来学习每一个图，训练双通道网络和分类器，从而将恶意文档检测问题表述为一个图学习和分类问题。实验表明，GLDOC 在准确率和误报率方面取得了全面的平衡--准确率和误报率分别为 95.33% 和 4.33%，优于其他检测方法。当进一步在模拟的 5 天攻击场景中进行测试时，我们提出的框架对未知漏洞仍能保持稳定和较高的检测准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊