Aiding Intrusion Analysis Using Machine Learning

2013 12th International Conference on Machine Learning and Applications Pub Date : 2013-12-04 DOI:10.1109/ICMLA.2013.103

Loai Zomlot, S. C. Sundaramurthy, Doina Caragea, Xinming Ou

{"title":"Aiding Intrusion Analysis Using Machine Learning","authors":"Loai Zomlot, S. C. Sundaramurthy, Doina Caragea, Xinming Ou","doi":"10.1109/ICMLA.2013.103","DOIUrl":null,"url":null,"abstract":"Intrusion analysis, i.e., the process of combing through IDS alerts and audit logs to identify real successful and attempted attacks, remains a difficult problem in practical network security defense. The major contributing cause to this problem is the high false-positive rate in the sensors used by IDS systems to detect malicious activities. The goal of our work is to examine whether a machine-learned classifier can help a human analyst filter out non-interesting scenarios reported by an IDS alert correlator, so that analysts' time can be saved. This research is conducted in the open-source SnIPS intrusion analysis framework. Throughout observing the output of SnIPS running on our departmental network, we found that an analyst would need to perform repetitive tasks in pruning out the false positives in the correlation graphs produced by it. We hypothesized that such repetitive tasks can yield (limited) labeled data that can enable the use of a machine learning-based approach to prune SnIPS' output based on the human analysts' feedback, much similar to spam filters that can learn from users' past judgment to prune emails. Our goal is to classify the correlation graphs produced from SnIPS into \"interesting\" and \"non-interesting\", where \"interesting\" means that a human analyst would want to conduct further analysis on the events. We spent significant amount of time manually labeling SnIPS' output correlations based on this criterion, and built prediction models using both supervised and semi-supervised learning approaches. Our experiments revealed a number of interesting observations that give insights into the pitfalls and challenges of applying machine learning in intrusion analysis. The experimentation results also indicate that semi-supervised learning is a promising approach towards practical machine learning-based tools that can aid human analysts, when a limited amount of labeled data is available.","PeriodicalId":168867,"journal":{"name":"2013 12th International Conference on Machine Learning and Applications","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 12th International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2013.103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Intrusion analysis, i.e., the process of combing through IDS alerts and audit logs to identify real successful and attempted attacks, remains a difficult problem in practical network security defense. The major contributing cause to this problem is the high false-positive rate in the sensors used by IDS systems to detect malicious activities. The goal of our work is to examine whether a machine-learned classifier can help a human analyst filter out non-interesting scenarios reported by an IDS alert correlator, so that analysts' time can be saved. This research is conducted in the open-source SnIPS intrusion analysis framework. Throughout observing the output of SnIPS running on our departmental network, we found that an analyst would need to perform repetitive tasks in pruning out the false positives in the correlation graphs produced by it. We hypothesized that such repetitive tasks can yield (limited) labeled data that can enable the use of a machine learning-based approach to prune SnIPS' output based on the human analysts' feedback, much similar to spam filters that can learn from users' past judgment to prune emails. Our goal is to classify the correlation graphs produced from SnIPS into "interesting" and "non-interesting", where "interesting" means that a human analyst would want to conduct further analysis on the events. We spent significant amount of time manually labeling SnIPS' output correlations based on this criterion, and built prediction models using both supervised and semi-supervised learning approaches. Our experiments revealed a number of interesting observations that give insights into the pitfalls and challenges of applying machine learning in intrusion analysis. The experimentation results also indicate that semi-supervised learning is a promising approach towards practical machine learning-based tools that can aid human analysts, when a limited amount of labeled data is available.

查看原文本刊更多论文

利用机器学习辅助入侵分析

入侵分析，即通过梳理IDS警报和审计日志来识别真正成功和企图攻击的过程，一直是实际网络安全防御中的难题。造成这个问题的主要原因是IDS系统用于检测恶意活动的传感器的高假阳性率。我们工作的目标是检查机器学习分类器是否可以帮助人类分析人员过滤掉IDS警报相关器报告的非有趣场景，从而节省分析人员的时间。本研究在开源的SnIPS入侵分析框架下进行。通过观察在我们部门网络上运行的SnIPS的输出，我们发现分析人员需要执行重复的任务，以清除由它产生的相关图中的假阳性。我们假设这种重复的任务可以产生(有限的)标记数据，这些数据可以使用基于机器学习的方法根据人类分析师的反馈来修剪SnIPS的输出，这非常类似于垃圾邮件过滤器，可以从用户过去的判断中学习来修剪电子邮件。我们的目标是将SnIPS生成的相关图分为“有趣的”和“不有趣的”，其中“有趣的”意味着人类分析师希望对事件进行进一步的分析。我们花了大量的时间根据这个标准手动标记SnIPS的输出相关性，并使用监督和半监督学习方法建立预测模型。我们的实验揭示了许多有趣的观察结果，这些观察结果让我们深入了解了在入侵分析中应用机器学习的陷阱和挑战。实验结果还表明，当可用的标记数据数量有限时，半监督学习是一种有前途的实用机器学习工具，可以帮助人类分析师。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 12th International Conference on Machine Learning and Applications

自引率

0.00%

发文量