探索(恶意)软件下载的长尾

2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) Pub Date : 2017-06-01 DOI:10.1109/DSN.2017.19

Babak Rahbarinia, Marco Balduzzi, R. Perdisci

{"title":"探索(恶意)软件下载的长尾","authors":"Babak Rahbarinia, Marco Balduzzi, R. Perdisci","doi":"10.1109/DSN.2017.19","DOIUrl":null,"url":null,"abstract":"In this paper, we present a large-scale study of global trends in software download events, with an analysis of both benign and malicious downloads, and a categorization of events for which no ground truth is currently available. Our measurement study is based on a unique, real-world dataset collected at Trend Micro containing more than 3 million in-the-wild web-based software download events involving hundreds of thousands of Internet machines, collected over a period of seven months. Somewhat surprisingly, we found that despite our best efforts and the use of multiple sources of ground truth, more than 83% of all downloaded software files remain unknown, i.e. cannot be classified as benign or malicious, even two years after they were first observed. If we consider the number of machines that have downloaded at least one unknown file, we find that more than 69% of the entire machine/user population downloaded one or more unknown software file. Because the accuracy of malware detection systems reported in the academic literature is typically assessed only over software files that can be labeled, our findings raise concerns on their actual effectiveness in large-scale real-world deployments, and on their ability to defend the majority of Internet machines from infection. To better understand what these unknown software files may be, we perform a detailed analysis of their properties. We then explore whether it is possible to extend the labeling of software downloads by building a rule-based system that automatically learns from the available ground truth and can be used to identify many more benign and malicious files with very high confidence. This allows us to greatly expand the number of software files that can be labeled with high confidence, thus providing results that can benefit the evaluation of future malware detection systems.","PeriodicalId":426928,"journal":{"name":"2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Exploring the Long Tail of (Malicious) Software Downloads\",\"authors\":\"Babak Rahbarinia, Marco Balduzzi, R. Perdisci\",\"doi\":\"10.1109/DSN.2017.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a large-scale study of global trends in software download events, with an analysis of both benign and malicious downloads, and a categorization of events for which no ground truth is currently available. Our measurement study is based on a unique, real-world dataset collected at Trend Micro containing more than 3 million in-the-wild web-based software download events involving hundreds of thousands of Internet machines, collected over a period of seven months. Somewhat surprisingly, we found that despite our best efforts and the use of multiple sources of ground truth, more than 83% of all downloaded software files remain unknown, i.e. cannot be classified as benign or malicious, even two years after they were first observed. If we consider the number of machines that have downloaded at least one unknown file, we find that more than 69% of the entire machine/user population downloaded one or more unknown software file. Because the accuracy of malware detection systems reported in the academic literature is typically assessed only over software files that can be labeled, our findings raise concerns on their actual effectiveness in large-scale real-world deployments, and on their ability to defend the majority of Internet machines from infection. To better understand what these unknown software files may be, we perform a detailed analysis of their properties. We then explore whether it is possible to extend the labeling of software downloads by building a rule-based system that automatically learns from the available ground truth and can be used to identify many more benign and malicious files with very high confidence. This allows us to greatly expand the number of software files that can be labeled with high confidence, thus providing results that can benefit the evaluation of future malware detection systems.\",\"PeriodicalId\":426928,\"journal\":{\"name\":\"2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSN.2017.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN.2017.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

在本文中，我们对软件下载事件的全球趋势进行了大规模研究，对良性和恶意下载进行了分析，并对目前没有真实情况的事件进行了分类。我们的测量研究是基于趋势科技收集的一个独特的、真实的数据集，其中包含超过300万的基于网络的软件下载事件，涉及数十万台互联网机器，收集时间为7个月。有些令人惊讶的是，我们发现，尽管我们尽了最大的努力，并使用了多种来源的地面真相，但超过83%的下载软件文件仍然未知，即无法归类为良性或恶意，甚至在它们首次被发现两年后。如果我们考虑至少下载了一个未知文件的机器数量，我们发现超过69%的整个机器/用户群体下载了一个或多个未知软件文件。因为在学术文献中报告的恶意软件检测系统的准确性通常只对可以标记的软件文件进行评估，我们的研究结果引起了人们对它们在大规模现实世界部署中的实际有效性的关注，以及它们保护大多数互联网机器免受感染的能力。为了更好地了解这些未知的软件文件可能是什么，我们对它们的属性进行了详细的分析。然后，我们探索是否有可能通过建立一个基于规则的系统来扩展软件下载的标签，该系统可以自动从可用的基础事实中学习，并且可以非常高的置信度用于识别更多的良性和恶意文件。这使我们能够极大地扩展可以高置信度标记的软件文件的数量，从而提供有利于评估未来恶意软件检测系统的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring the Long Tail of (Malicious) Software Downloads

In this paper, we present a large-scale study of global trends in software download events, with an analysis of both benign and malicious downloads, and a categorization of events for which no ground truth is currently available. Our measurement study is based on a unique, real-world dataset collected at Trend Micro containing more than 3 million in-the-wild web-based software download events involving hundreds of thousands of Internet machines, collected over a period of seven months. Somewhat surprisingly, we found that despite our best efforts and the use of multiple sources of ground truth, more than 83% of all downloaded software files remain unknown, i.e. cannot be classified as benign or malicious, even two years after they were first observed. If we consider the number of machines that have downloaded at least one unknown file, we find that more than 69% of the entire machine/user population downloaded one or more unknown software file. Because the accuracy of malware detection systems reported in the academic literature is typically assessed only over software files that can be labeled, our findings raise concerns on their actual effectiveness in large-scale real-world deployments, and on their ability to defend the majority of Internet machines from infection. To better understand what these unknown software files may be, we perform a detailed analysis of their properties. We then explore whether it is possible to extend the labeling of software downloads by building a rule-based system that automatically learns from the available ground truth and can be used to identify many more benign and malicious files with very high confidence. This allows us to greatly expand the number of software files that can be labeled with high confidence, thus providing results that can benefit the evaluation of future malware detection systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

自引率

0.00%

发文量