Aiding Intrusion Analysis Using Machine Learning

Loai Zomlot, S. C. Sundaramurthy, Doina Caragea, Xinming Ou
{"title":"Aiding Intrusion Analysis Using Machine Learning","authors":"Loai Zomlot, S. C. Sundaramurthy, Doina Caragea, Xinming Ou","doi":"10.1109/ICMLA.2013.103","DOIUrl":null,"url":null,"abstract":"Intrusion analysis, i.e., the process of combing through IDS alerts and audit logs to identify real successful and attempted attacks, remains a difficult problem in practical network security defense. The major contributing cause to this problem is the high false-positive rate in the sensors used by IDS systems to detect malicious activities. The goal of our work is to examine whether a machine-learned classifier can help a human analyst filter out non-interesting scenarios reported by an IDS alert correlator, so that analysts' time can be saved. This research is conducted in the open-source SnIPS intrusion analysis framework. Throughout observing the output of SnIPS running on our departmental network, we found that an analyst would need to perform repetitive tasks in pruning out the false positives in the correlation graphs produced by it. We hypothesized that such repetitive tasks can yield (limited) labeled data that can enable the use of a machine learning-based approach to prune SnIPS' output based on the human analysts' feedback, much similar to spam filters that can learn from users' past judgment to prune emails. Our goal is to classify the correlation graphs produced from SnIPS into \"interesting\" and \"non-interesting\", where \"interesting\" means that a human analyst would want to conduct further analysis on the events. We spent significant amount of time manually labeling SnIPS' output correlations based on this criterion, and built prediction models using both supervised and semi-supervised learning approaches. Our experiments revealed a number of interesting observations that give insights into the pitfalls and challenges of applying machine learning in intrusion analysis. The experimentation results also indicate that semi-supervised learning is a promising approach towards practical machine learning-based tools that can aid human analysts, when a limited amount of labeled data is available.","PeriodicalId":168867,"journal":{"name":"2013 12th International Conference on Machine Learning and Applications","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 12th International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2013.103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Intrusion analysis, i.e., the process of combing through IDS alerts and audit logs to identify real successful and attempted attacks, remains a difficult problem in practical network security defense. The major contributing cause to this problem is the high false-positive rate in the sensors used by IDS systems to detect malicious activities. The goal of our work is to examine whether a machine-learned classifier can help a human analyst filter out non-interesting scenarios reported by an IDS alert correlator, so that analysts' time can be saved. This research is conducted in the open-source SnIPS intrusion analysis framework. Throughout observing the output of SnIPS running on our departmental network, we found that an analyst would need to perform repetitive tasks in pruning out the false positives in the correlation graphs produced by it. We hypothesized that such repetitive tasks can yield (limited) labeled data that can enable the use of a machine learning-based approach to prune SnIPS' output based on the human analysts' feedback, much similar to spam filters that can learn from users' past judgment to prune emails. Our goal is to classify the correlation graphs produced from SnIPS into "interesting" and "non-interesting", where "interesting" means that a human analyst would want to conduct further analysis on the events. We spent significant amount of time manually labeling SnIPS' output correlations based on this criterion, and built prediction models using both supervised and semi-supervised learning approaches. Our experiments revealed a number of interesting observations that give insights into the pitfalls and challenges of applying machine learning in intrusion analysis. The experimentation results also indicate that semi-supervised learning is a promising approach towards practical machine learning-based tools that can aid human analysts, when a limited amount of labeled data is available.
利用机器学习辅助入侵分析
入侵分析,即通过梳理IDS警报和审计日志来识别真正成功和企图攻击的过程,一直是实际网络安全防御中的难题。造成这个问题的主要原因是IDS系统用于检测恶意活动的传感器的高假阳性率。我们工作的目标是检查机器学习分类器是否可以帮助人类分析人员过滤掉IDS警报相关器报告的非有趣场景,从而节省分析人员的时间。本研究在开源的SnIPS入侵分析框架下进行。通过观察在我们部门网络上运行的SnIPS的输出,我们发现分析人员需要执行重复的任务,以清除由它产生的相关图中的假阳性。我们假设这种重复的任务可以产生(有限的)标记数据,这些数据可以使用基于机器学习的方法根据人类分析师的反馈来修剪SnIPS的输出,这非常类似于垃圾邮件过滤器,可以从用户过去的判断中学习来修剪电子邮件。我们的目标是将SnIPS生成的相关图分为“有趣的”和“不有趣的”,其中“有趣的”意味着人类分析师希望对事件进行进一步的分析。我们花了大量的时间根据这个标准手动标记SnIPS的输出相关性,并使用监督和半监督学习方法建立预测模型。我们的实验揭示了许多有趣的观察结果,这些观察结果让我们深入了解了在入侵分析中应用机器学习的陷阱和挑战。实验结果还表明,当可用的标记数据数量有限时,半监督学习是一种有前途的实用机器学习工具,可以帮助人类分析师。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信