一种通过域名系统协议检测低吞吐量溢出的统计方法

Proceedings of the 2020 Workshop on DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security Pub Date : 2020-12-07 DOI:10.1145/3477997.3478007

Emily Joback, Leslie Shing, Kenneth Alperin, Steven R. Gomez, Steven Jorgensen, Gabe Elkin

{"title":"一种通过域名系统协议检测低吞吐量溢出的统计方法","authors":"Emily Joback, Leslie Shing, Kenneth Alperin, Steven R. Gomez, Steven Jorgensen, Gabe Elkin","doi":"10.1145/3477997.3478007","DOIUrl":null,"url":null,"abstract":"The Domain Name System (DNS) is a critical network protocol that resolves human-readable domain names to IP addresses. Because it is an essential component necessary for the Internet to function, DNS traffic is typically allowed to bypass firewalls and other security services. Additionally, this protocol was not designed for the purpose of data transfer, so is not as heavily monitored as other protocols. These reasons make the protocol an ideal tool for covert data exfiltration by a malicious actor. A typical company or organization has network traffic containing tens to hundreds of thousands of DNS queries a day. It is impossible for an analyst to sift through such a vast dataset and investigate every domain to ensure its legitimacy. An attacker can use this as an advantage to hide traces of malicious activity within a small percentage of total traffic. Recent research in this field has focused on applying supervised machine learning (ML) or one-class classifier techniques to build a predictive model to determine if a DNS domain query is used for exfiltration purposes; however, these models require labelled datasets. In the supervised approach, models require both legitimate and malicious data samples, but it is difficult to train these models since realistic network datasets containing known DNS exploits are rarely made public. Instead, prior studies used synthetic curated datasets, but this has the potential to introduce bias. In addition, some studies have suggested that ML algorithms do not perform as well in situations where the ratio between the two classes of data is significant, as is the case for DNS exfiltration datasets. In the one-class classifier approach, these models require a dataset known to be void of exfiltration data. Our model aims to circumvent these issues by identifying cases of DNS exfiltration within a network, without requiring a labelled or curated dataset. Our approach eliminates the need for a network analyst to sift through a high volume of DNS queries, by automatically detecting traffic indicative of exfiltration.","PeriodicalId":130265,"journal":{"name":"Proceedings of the 2020 Workshop on DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Statistical Approach to Detecting Low-Throughput Exfiltration through the Domain Name System Protocol\",\"authors\":\"Emily Joback, Leslie Shing, Kenneth Alperin, Steven R. Gomez, Steven Jorgensen, Gabe Elkin\",\"doi\":\"10.1145/3477997.3478007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Domain Name System (DNS) is a critical network protocol that resolves human-readable domain names to IP addresses. Because it is an essential component necessary for the Internet to function, DNS traffic is typically allowed to bypass firewalls and other security services. Additionally, this protocol was not designed for the purpose of data transfer, so is not as heavily monitored as other protocols. These reasons make the protocol an ideal tool for covert data exfiltration by a malicious actor. A typical company or organization has network traffic containing tens to hundreds of thousands of DNS queries a day. It is impossible for an analyst to sift through such a vast dataset and investigate every domain to ensure its legitimacy. An attacker can use this as an advantage to hide traces of malicious activity within a small percentage of total traffic. Recent research in this field has focused on applying supervised machine learning (ML) or one-class classifier techniques to build a predictive model to determine if a DNS domain query is used for exfiltration purposes; however, these models require labelled datasets. In the supervised approach, models require both legitimate and malicious data samples, but it is difficult to train these models since realistic network datasets containing known DNS exploits are rarely made public. Instead, prior studies used synthetic curated datasets, but this has the potential to introduce bias. In addition, some studies have suggested that ML algorithms do not perform as well in situations where the ratio between the two classes of data is significant, as is the case for DNS exfiltration datasets. In the one-class classifier approach, these models require a dataset known to be void of exfiltration data. Our model aims to circumvent these issues by identifying cases of DNS exfiltration within a network, without requiring a labelled or curated dataset. Our approach eliminates the need for a network analyst to sift through a high volume of DNS queries, by automatically detecting traffic indicative of exfiltration.\",\"PeriodicalId\":130265,\"journal\":{\"name\":\"Proceedings of the 2020 Workshop on DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 Workshop on DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3477997.3478007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 Workshop on DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3477997.3478007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

DNS (Domain Name System)是将人类可读的域名解析为IP地址的重要网络协议。因为它是Internet运行所必需的基本组件，所以通常允许DNS流量绕过防火墙和其他安全服务。此外，该协议不是为数据传输而设计的，因此不像其他协议那样受到严格监控。这些原因使该协议成为恶意行为者隐蔽数据泄露的理想工具。典型的公司或组织每天的网络流量包含数万到数十万个DNS查询。分析师不可能筛选如此庞大的数据集，并调查每个领域以确保其合法性。攻击者可以利用这一优势，在总流量的一小部分中隐藏恶意活动的痕迹。该领域最近的研究主要集中在应用监督机器学习(ML)或单类分类器技术来构建预测模型，以确定DNS域查询是否用于泄漏目的;然而，这些模型需要标记数据集。在监督方法中，模型需要合法和恶意的数据样本，但是很难训练这些模型，因为包含已知DNS漏洞的实际网络数据集很少公开。相反，之前的研究使用了合成整理的数据集，但这有可能引入偏见。此外，一些研究表明，ML算法在两类数据之间的比例很大的情况下表现不佳，就像DNS泄露数据集的情况一样。在单类分类器方法中，这些模型需要一个已知没有泄漏数据的数据集。我们的模型旨在通过识别网络中的DNS泄露案例来规避这些问题，而不需要标记或管理数据集。我们的方法通过自动检测泄露的流量指示，消除了网络分析师筛选大量DNS查询的需要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Statistical Approach to Detecting Low-Throughput Exfiltration through the Domain Name System Protocol

The Domain Name System (DNS) is a critical network protocol that resolves human-readable domain names to IP addresses. Because it is an essential component necessary for the Internet to function, DNS traffic is typically allowed to bypass firewalls and other security services. Additionally, this protocol was not designed for the purpose of data transfer, so is not as heavily monitored as other protocols. These reasons make the protocol an ideal tool for covert data exfiltration by a malicious actor. A typical company or organization has network traffic containing tens to hundreds of thousands of DNS queries a day. It is impossible for an analyst to sift through such a vast dataset and investigate every domain to ensure its legitimacy. An attacker can use this as an advantage to hide traces of malicious activity within a small percentage of total traffic. Recent research in this field has focused on applying supervised machine learning (ML) or one-class classifier techniques to build a predictive model to determine if a DNS domain query is used for exfiltration purposes; however, these models require labelled datasets. In the supervised approach, models require both legitimate and malicious data samples, but it is difficult to train these models since realistic network datasets containing known DNS exploits are rarely made public. Instead, prior studies used synthetic curated datasets, but this has the potential to introduce bias. In addition, some studies have suggested that ML algorithms do not perform as well in situations where the ratio between the two classes of data is significant, as is the case for DNS exfiltration datasets. In the one-class classifier approach, these models require a dataset known to be void of exfiltration data. Our model aims to circumvent these issues by identifying cases of DNS exfiltration within a network, without requiring a labelled or curated dataset. Our approach eliminates the need for a network analyst to sift through a high volume of DNS queries, by automatically detecting traffic indicative of exfiltration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2020 Workshop on DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security

自引率

0.00%

发文量