Global and local prevalence weighting of missed attack sample impacts for endpoint security product comparative detection testing

2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE) Pub Date : 2014-10-01 DOI:10.1109/MALWARE.2014.6999413

A. Clementi, Peter Stelzhammer, F. C. Osorio

{"title":"Global and local prevalence weighting of missed attack sample impacts for endpoint security product comparative detection testing","authors":"A. Clementi, Peter Stelzhammer, F. C. Osorio","doi":"10.1109/MALWARE.2014.6999413","DOIUrl":null,"url":null,"abstract":"In the past, several methods have been used to select Malware attack samples, the so-called Stimulus Workload (SW), used in Malware-detection tests of endpoint security products. For example, in the selection process one must be aware that amongst the samples selected, some pose a greater threat to users than others as they are more widespread and hence are more likely to affect a user. Some may target a specific company or user base, but present less risk to other users. Other Malware attack samples may only be found on specific websites, affect specific countries/regions, or only be relevant to particular operating system versions or interface languages (English, German, Chinese, and so forth). Unfortunately, and due to such variability, the selection of samples can and will skew the results dramatically. For this reason, over the last several years, the Security Effectiveness Measurement Community & Ecosystem (SEMCE), has begun the process of adopting a test methodology that requires strict adherence to standards. The primary reason for the adoption of said methodology, first described in [1], is to assure the reproducibility and reliability of test results. These methodology requires that the stimulus workload used must be a reliable/good proxy for the actual environment that the products are expected to encounter in the wild. In this manuscript, we present the results of end-point security protection products effectiveness when the selected stimulus workload (SW) takes into consideration the variabilities such as the ones described above. We called these workloads CSW or Customizable Stimulus Workloads, and our results show great variance as to the effectiveness of end-point products when such CSW's are used. Our evaluation of end-point security products uses simple metric, namely missed detections. The generation of the CSWs depended heavily on Microsoft's Global telemetry data gathered in 2013 and 2014 for Microsoft Windows updates. Twenty-two (22) end-point security products were evaluated using such a methodology. The results obtained show great variability between the miss ratios, meaning the number of Malware samples the product failed to detect versus the customer impact coefficient amongst vendors. For example, two end-point protection products that had similar miss percentages of 0.2 % and 0.4 % showed dramatic customer impact coefficient differences of 0.001209 and 0.018903 respectively. Meaning, that when miss percentages were normalized for factors such as prevalence, Operating System, languages, and so fort, systems protected by one vendor were 18 times more likely to suffer an infection that their counterpart.","PeriodicalId":151942,"journal":{"name":"2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MALWARE.2014.6999413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In the past, several methods have been used to select Malware attack samples, the so-called Stimulus Workload (SW), used in Malware-detection tests of endpoint security products. For example, in the selection process one must be aware that amongst the samples selected, some pose a greater threat to users than others as they are more widespread and hence are more likely to affect a user. Some may target a specific company or user base, but present less risk to other users. Other Malware attack samples may only be found on specific websites, affect specific countries/regions, or only be relevant to particular operating system versions or interface languages (English, German, Chinese, and so forth). Unfortunately, and due to such variability, the selection of samples can and will skew the results dramatically. For this reason, over the last several years, the Security Effectiveness Measurement Community & Ecosystem (SEMCE), has begun the process of adopting a test methodology that requires strict adherence to standards. The primary reason for the adoption of said methodology, first described in [1], is to assure the reproducibility and reliability of test results. These methodology requires that the stimulus workload used must be a reliable/good proxy for the actual environment that the products are expected to encounter in the wild. In this manuscript, we present the results of end-point security protection products effectiveness when the selected stimulus workload (SW) takes into consideration the variabilities such as the ones described above. We called these workloads CSW or Customizable Stimulus Workloads, and our results show great variance as to the effectiveness of end-point products when such CSW's are used. Our evaluation of end-point security products uses simple metric, namely missed detections. The generation of the CSWs depended heavily on Microsoft's Global telemetry data gathered in 2013 and 2014 for Microsoft Windows updates. Twenty-two (22) end-point security products were evaluated using such a methodology. The results obtained show great variability between the miss ratios, meaning the number of Malware samples the product failed to detect versus the customer impact coefficient amongst vendors. For example, two end-point protection products that had similar miss percentages of 0.2 % and 0.4 % showed dramatic customer impact coefficient differences of 0.001209 and 0.018903 respectively. Meaning, that when miss percentages were normalized for factors such as prevalence, Operating System, languages, and so fort, systems protected by one vendor were 18 times more likely to suffer an infection that their counterpart.

查看原文本刊更多论文

端点安全产品比较检测测试中缺失攻击样本影响的全局和局部流行率加权

在过去，已经使用了几种方法来选择恶意软件攻击样本，即所谓的刺激工作量(SW)，用于端点安全产品的恶意软件检测测试。例如，在选择过程中，必须意识到在所选择的样本中，有些样本比其他样本对用户构成更大的威胁，因为它们更广泛，因此更有可能影响用户。有些可能针对特定的公司或用户群，但对其他用户的风险较小。其他恶意软件攻击样本可能仅在特定网站上发现，影响特定国家/地区，或仅与特定操作系统版本或界面语言(英语，德语，中文等)相关。不幸的是，由于这种可变性，样本的选择可能并且将会极大地扭曲结果。出于这个原因，在过去的几年里，安全有效性测量社区和生态系统(SEMCE)已经开始采用一种需要严格遵守标准的测试方法。采用上述方法(在[1]中首次描述)的主要原因是为了确保测试结果的可重复性和可靠性。这些方法要求所使用的刺激工作量必须是产品预期在野外遇到的实际环境的可靠/良好代理。在本文中，我们提出了当选择的刺激工作量(SW)考虑到诸如上述变量时，端点安全保护产品有效性的结果。我们将这些工作负载称为CSW或可定制刺激工作负载，我们的结果显示，当使用此类CSW时，终端产品的有效性存在很大差异。我们对终端安全产品的评估使用简单的度量，即未检测。CSWs的生成在很大程度上依赖于微软在2013年和2014年为微软Windows更新收集的全球遥测数据。使用这种方法评估了二十二(22)个端点安全产品。所获得的结果显示了缺失率之间的巨大差异，这意味着产品未能检测到的恶意软件样本数量与供应商之间的客户影响系数之间的差异。例如，两种终端保护产品的缺失率相似，分别为0.2%和0.4%，其客户影响系数差异分别为0.001209和0.018903。这意味着，当遗漏百分比因流行程度、操作系统、语言等因素而标准化时，一个供应商保护的系统遭受感染的可能性是其对应系统的18倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE)

自引率

0.00%

发文量