利用机器学习进行大规模漏洞发现

Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy Pub Date : 2016-03-09 DOI:10.1145/2857705.2857720

Gustavo Grieco, G. Grinblat, Lucas C. Uzal, Sanjay Rawat, Josselin Feist, L. Mounier

{"title":"利用机器学习进行大规模漏洞发现","authors":"Gustavo Grieco, G. Grinblat, Lucas C. Uzal, Sanjay Rawat, Josselin Feist, L. Mounier","doi":"10.1145/2857705.2857720","DOIUrl":null,"url":null,"abstract":"With sustained growth of software complexity, finding security vulnerabilities in operating systems has become an important necessity. Nowadays, OS are shipped with thousands of binary executables. Unfortunately, methodologies and tools for an OS scale program testing within a limited time budget are still missing. In this paper we present an approach that uses lightweight static and dynamic features to predict if a test case is likely to contain a software vulnerability using machine learning techniques. To show the effectiveness of our approach, we set up a large experiment to detect easily exploitable memory corruptions using 1039 Debian programs obtained from its bug tracker, collected 138,308 unique execution traces and statically explored 76,083 different subsequences of function calls. We managed to predict with reasonable accuracy which programs contained dangerous memory corruptions. We also developed and implemented VDiscover, a tool that uses state-of-the-art Machine Learning techniques to predict vulnerabilities in test cases. Such tool will be released as open-source to encourage the research of vulnerability discovery at a large scale, together with VDiscovery, a public dataset that collects raw analyzed data.","PeriodicalId":377412,"journal":{"name":"Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"180","resultStr":"{\"title\":\"Toward Large-Scale Vulnerability Discovery using Machine Learning\",\"authors\":\"Gustavo Grieco, G. Grinblat, Lucas C. Uzal, Sanjay Rawat, Josselin Feist, L. Mounier\",\"doi\":\"10.1145/2857705.2857720\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With sustained growth of software complexity, finding security vulnerabilities in operating systems has become an important necessity. Nowadays, OS are shipped with thousands of binary executables. Unfortunately, methodologies and tools for an OS scale program testing within a limited time budget are still missing. In this paper we present an approach that uses lightweight static and dynamic features to predict if a test case is likely to contain a software vulnerability using machine learning techniques. To show the effectiveness of our approach, we set up a large experiment to detect easily exploitable memory corruptions using 1039 Debian programs obtained from its bug tracker, collected 138,308 unique execution traces and statically explored 76,083 different subsequences of function calls. We managed to predict with reasonable accuracy which programs contained dangerous memory corruptions. We also developed and implemented VDiscover, a tool that uses state-of-the-art Machine Learning techniques to predict vulnerabilities in test cases. Such tool will be released as open-source to encourage the research of vulnerability discovery at a large scale, together with VDiscovery, a public dataset that collects raw analyzed data.\",\"PeriodicalId\":377412,\"journal\":{\"name\":\"Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"180\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2857705.2857720\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2857705.2857720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 180

摘要

随着软件复杂性的持续增长，发现操作系统中的安全漏洞已成为一项重要的必要工作。如今，操作系统附带了数千个二进制可执行文件。不幸的是，在有限的时间预算内用于操作系统规模程序测试的方法和工具仍然缺乏。在本文中，我们提出了一种使用轻量级静态和动态特征来预测测试用例是否可能包含使用机器学习技术的软件漏洞的方法。为了展示我们方法的有效性，我们设置了一个大型实验，使用从bug跟踪器获得的1039个Debian程序来检测容易利用的内存损坏，收集了138,308个唯一的执行跟踪，并静态地探索了76,083个不同的函数调用子序列。我们设法以合理的准确性预测哪些程序包含危险的内存损坏。我们还开发并实现了VDiscover，这是一个使用最先进的机器学习技术来预测测试用例中的漏洞的工具。该工具将作为开源发布，以鼓励对漏洞发现的大规模研究，并与收集原始分析数据的公共数据集VDiscovery一起发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Toward Large-Scale Vulnerability Discovery using Machine Learning

With sustained growth of software complexity, finding security vulnerabilities in operating systems has become an important necessity. Nowadays, OS are shipped with thousands of binary executables. Unfortunately, methodologies and tools for an OS scale program testing within a limited time budget are still missing. In this paper we present an approach that uses lightweight static and dynamic features to predict if a test case is likely to contain a software vulnerability using machine learning techniques. To show the effectiveness of our approach, we set up a large experiment to detect easily exploitable memory corruptions using 1039 Debian programs obtained from its bug tracker, collected 138,308 unique execution traces and statically explored 76,083 different subsequences of function calls. We managed to predict with reasonable accuracy which programs contained dangerous memory corruptions. We also developed and implemented VDiscover, a tool that uses state-of-the-art Machine Learning techniques to predict vulnerabilities in test cases. Such tool will be released as open-source to encourage the research of vulnerability discovery at a large scale, together with VDiscovery, a public dataset that collects raw analyzed data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy

自引率

0.00%

发文量