Discovering New Malware Families Using a Linguistic-Based Macros Detection Method

2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW) Pub Date : 2018-11-01 DOI:10.1109/CANDARW.2018.00085

Hiroya Miura, M. Mimura, Hidema Tanaka

{"title":"Discovering New Malware Families Using a Linguistic-Based Macros Detection Method","authors":"Hiroya Miura, M. Mimura, Hidema Tanaka","doi":"10.1109/CANDARW.2018.00085","DOIUrl":null,"url":null,"abstract":"In recent years, the number of targeted email attacks using malicious macros has been increasing. Malicious macros are malware which is written in Visual Basic for Application. Since much source code of malicious macros is highly obfuscated, the source code contains many obfuscated words such as random numbers or strings. Today, new malware families are frequently discovered. To detect unseen malicious macros, previous work proposed a method using natural language techniques. The proposed method separates macro's source code into words, and detects malicious macros based on the appearance frequency. This method could detect unseen malicious macros. However, the unseen malicious macros might consist of known malware families. Furthermore, the mechanism and effectiveness of this method are not clear. In particular, detecting new malware families is a top priority. Hence, this paper reveals the mechanism and effectiveness of this method to detect new malware families. Our experiment shows that using only malicious macros for feature extraction and consolidating obfuscated words into a word were effective. We confirmed this method could discover 89% of new malware families.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"332 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CANDARW.2018.00085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

In recent years, the number of targeted email attacks using malicious macros has been increasing. Malicious macros are malware which is written in Visual Basic for Application. Since much source code of malicious macros is highly obfuscated, the source code contains many obfuscated words such as random numbers or strings. Today, new malware families are frequently discovered. To detect unseen malicious macros, previous work proposed a method using natural language techniques. The proposed method separates macro's source code into words, and detects malicious macros based on the appearance frequency. This method could detect unseen malicious macros. However, the unseen malicious macros might consist of known malware families. Furthermore, the mechanism and effectiveness of this method are not clear. In particular, detecting new malware families is a top priority. Hence, this paper reveals the mechanism and effectiveness of this method to detect new malware families. Our experiment shows that using only malicious macros for feature extraction and consolidating obfuscated words into a word were effective. We confirmed this method could discover 89% of new malware families.

查看原文本刊更多论文

使用基于语言的宏检测方法发现新的恶意软件家族

近年来，使用恶意宏的针对性邮件攻击数量不断增加。恶意宏是用Visual Basic for Application编写的恶意软件。由于许多恶意宏的源代码是高度混淆的，因此源代码包含许多混淆的单词，例如随机数或字符串。今天，新的恶意软件家族经常被发现。为了检测看不见的恶意宏，以前的工作提出了一种使用自然语言技术的方法。该方法将宏的源代码分离成单词，并根据出现频率检测恶意宏。此方法可以检测不可见的恶意宏。然而，看不见的恶意宏可能由已知的恶意软件家族组成。此外，该方法的机制和有效性尚不清楚。特别是，检测新的恶意软件家族是当务之急。因此，本文揭示了该方法检测新恶意软件家族的机制和有效性。我们的实验表明，仅使用恶意宏进行特征提取并将混淆的单词合并为一个单词是有效的。我们证实这种方法可以发现89%的新恶意软件家族。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)

自引率

0.00%

发文量