FORECAST: skimming off the malware cream

M. Neugschwandtner, P. M. Comparetti, G. Jacob, Christopher Krügel
{"title":"FORECAST: skimming off the malware cream","authors":"M. Neugschwandtner, P. M. Comparetti, G. Jacob, Christopher Krügel","doi":"10.1145/2076732.2076735","DOIUrl":null,"url":null,"abstract":"To handle the large number of malware samples appearing in the wild each day, security analysts and vendors employ automated tools to detect, classify and analyze malicious code. Because malware is typically resistant to static analysis, automated dynamic analysis is widely used for this purpose. Executing malicious software in a controlled environment while observing its behavior can provide rich information on a malware's capabilities. However, running each malware sample even for a few minutes is expensive. For this reason, malware analysis efforts need to select a subset of samples for analysis. To date, this selection has been performed either randomly or using techniques focused on avoiding re-analysis of polymorphic malware variants [41, 23].\n In this paper, we present a novel approach to sample selection that attempts to maximize the total value of the information obtained from analysis, according to an application-dependent scoring function. To this end, we leverage previous work on behavioral malware clustering [14] and introduce a machine-learning-based system that uses all statically-available information to predict into which behavioral class a sample will fall, before the sample is actually executed. We discuss scoring functions tailored at two practical applications of large-scale dynamic analysis: the compilation of network blacklists of command and control servers and the generation of remediation procedures for malware infections. We implement these techniques in a tool called ForeCast. Large-scale evaluation on over 600,000 malware samples shows that our prototype can increase the amount of potential command and control servers detected by up to 137% over a random selection strategy and 54% over a selection strategy based on sample diversity.","PeriodicalId":397003,"journal":{"name":"Asia-Pacific Computer Systems Architecture Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asia-Pacific Computer Systems Architecture Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2076732.2076735","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

Abstract

To handle the large number of malware samples appearing in the wild each day, security analysts and vendors employ automated tools to detect, classify and analyze malicious code. Because malware is typically resistant to static analysis, automated dynamic analysis is widely used for this purpose. Executing malicious software in a controlled environment while observing its behavior can provide rich information on a malware's capabilities. However, running each malware sample even for a few minutes is expensive. For this reason, malware analysis efforts need to select a subset of samples for analysis. To date, this selection has been performed either randomly or using techniques focused on avoiding re-analysis of polymorphic malware variants [41, 23]. In this paper, we present a novel approach to sample selection that attempts to maximize the total value of the information obtained from analysis, according to an application-dependent scoring function. To this end, we leverage previous work on behavioral malware clustering [14] and introduce a machine-learning-based system that uses all statically-available information to predict into which behavioral class a sample will fall, before the sample is actually executed. We discuss scoring functions tailored at two practical applications of large-scale dynamic analysis: the compilation of network blacklists of command and control servers and the generation of remediation procedures for malware infections. We implement these techniques in a tool called ForeCast. Large-scale evaluation on over 600,000 malware samples shows that our prototype can increase the amount of potential command and control servers detected by up to 137% over a random selection strategy and 54% over a selection strategy based on sample diversity.
预测:清除恶意软件
为了处理每天出现的大量恶意软件样本,安全分析师和供应商使用自动化工具来检测、分类和分析恶意代码。由于恶意软件通常抵抗静态分析,因此自动动态分析被广泛用于此目的。在受控环境中执行恶意软件,同时观察其行为,可以提供有关恶意软件功能的丰富信息。然而,运行每个恶意软件样本即使几分钟也是昂贵的。由于这个原因,恶意软件分析工作需要选择一个样本子集进行分析。迄今为止,这种选择要么是随机进行的,要么是使用技术来避免对多态恶意软件变体进行重新分析[41,23]。在本文中,我们提出了一种新的样本选择方法,根据应用相关的评分函数,试图最大化从分析中获得的信息的总价值。为此,我们利用之前在行为恶意软件聚类方面的工作[14],并引入了一个基于机器学习的系统,该系统使用所有静态可用信息来预测样本将落入哪个行为类,然后再实际执行样本。我们讨论了针对大规模动态分析的两个实际应用定制的评分功能:命令和控制服务器的网络黑名单的编制以及恶意软件感染的修复程序的生成。我们在一个叫做ForeCast的工具中实现了这些技术。对超过60万个恶意软件样本的大规模评估表明,我们的原型可以比随机选择策略增加多达137%的潜在命令和控制服务器检测数量,比基于样本多样性的选择策略增加54%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信