A dataset generator for next generation system call host intrusion detection systems

Marcus Pendleton, Shouhuai Xu
{"title":"A dataset generator for next generation system call host intrusion detection systems","authors":"Marcus Pendleton, Shouhuai Xu","doi":"10.1109/MILCOM.2017.8170835","DOIUrl":null,"url":null,"abstract":"Over the years, system calls (syscalls) have become an increasingly popular data source for host intrusion detection systems (HIDS). This is partly due to their strong security semantic implications. As syscalls conform to a program's control-flow graph, a deviation in a syscall sequence may imply a deviation in a program's control-flow graph. This is useful for detecting the control-flow hijacking class of attacks. Additionally, malware must utilize syscalls in order to provide any utility to the attacker, with the exception of some denial-of-service attacks. Because all syscalls are observable from the kernel, this makes evasion difficult for attackers under syscall HIDS. Given their suitability for HIDS, many approaches based on syscalls have been proposed. However, the syscall datasets available are not always the most suitable for these and emerging techniques in analytics, as they may need additional structural or contextual information about syscalls in their decision engine. Furthermore, this flatness of previous datasets often pigeonholes solutions into those which are limited by that data view. It is also burdensome on the researcher to generate his own custom dataset. In this work, we propose an extensible syscall dataset generator which includes structural and limited contextual information regarding syscalls, yet allows for researchers to easily add their own features to more quickly develop and evaluate their systems. Our dataset generator can aid researchers in widening the solution space for syscall HIDS.","PeriodicalId":113767,"journal":{"name":"MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MILCOM.2017.8170835","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

Over the years, system calls (syscalls) have become an increasingly popular data source for host intrusion detection systems (HIDS). This is partly due to their strong security semantic implications. As syscalls conform to a program's control-flow graph, a deviation in a syscall sequence may imply a deviation in a program's control-flow graph. This is useful for detecting the control-flow hijacking class of attacks. Additionally, malware must utilize syscalls in order to provide any utility to the attacker, with the exception of some denial-of-service attacks. Because all syscalls are observable from the kernel, this makes evasion difficult for attackers under syscall HIDS. Given their suitability for HIDS, many approaches based on syscalls have been proposed. However, the syscall datasets available are not always the most suitable for these and emerging techniques in analytics, as they may need additional structural or contextual information about syscalls in their decision engine. Furthermore, this flatness of previous datasets often pigeonholes solutions into those which are limited by that data view. It is also burdensome on the researcher to generate his own custom dataset. In this work, we propose an extensible syscall dataset generator which includes structural and limited contextual information regarding syscalls, yet allows for researchers to easily add their own features to more quickly develop and evaluate their systems. Our dataset generator can aid researchers in widening the solution space for syscall HIDS.
一个用于下一代系统调用主机入侵检测系统的数据集生成器
多年来,系统调用(sycall)已成为主机入侵检测系统(HIDS)日益流行的数据源。这部分是由于它们强大的安全语义含义。由于系统调用符合程序的控制流图,因此系统调用序列中的偏差可能意味着程序的控制流图中的偏差。这对于检测控制流劫持类攻击很有用。此外,恶意软件必须利用系统调用,以便为攻击者提供任何实用程序,某些拒绝服务攻击除外。因为所有的系统调用都可以从内核中观察到,这使得攻击者很难在系统调用HIDS下逃避攻击。考虑到它们对HIDS的适用性,已经提出了许多基于系统调用的方法。然而,可用的系统调用数据集并不总是最适合这些和新兴的分析技术,因为它们可能需要关于决策引擎中的系统调用的额外结构或上下文信息。此外,以前数据集的这种平整性经常将解决方案归类为受数据视图限制的解决方案。对于研究人员来说,生成自己的自定义数据集也很麻烦。在这项工作中,我们提出了一个可扩展的系统调用数据集生成器,其中包括关于系统调用的结构和有限的上下文信息,但允许研究人员轻松添加自己的功能,以更快地开发和评估他们的系统。我们的数据集生成器可以帮助研究人员扩大系统调用HIDS的解决方案空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信