Integer Data Zero-Watermark Assisted System Calls Abstraction and Normalization for Host Based Anomaly Detection Systems

2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing Pub Date : 2015-11-03 DOI:10.1109/CSCloud.2015.11

Waqas Haider, Jiankun Hu, Xinghuo Yu, Yi Xie

{"title":"Integer Data Zero-Watermark Assisted System Calls Abstraction and Normalization for Host Based Anomaly Detection Systems","authors":"Waqas Haider, Jiankun Hu, Xinghuo Yu, Yi Xie","doi":"10.1109/CSCloud.2015.11","DOIUrl":null,"url":null,"abstract":"The generation of representative computer system behavior profile from system calls in LINUX environments to establish reliable Host Based Anomaly Detection Systems (HADS) against Next Generation of Attacks (NGA) is a challenge due to two major reasons. Firstly, NGA causes a low footprint upon host activities and consequently, attack activities are difficult to detect from normal computer processes in terms of accuracy and processing time. Secondly, there is no effective method to extract the natural difference from the two different types of traces (e.g. normal or abnormal) of system calls. Following these reasons, a semi-supervised model is proposed, which is comprised of two parts. Firstly, to establish an unsupervised computer behavior classification, an integer data zero-watermarking algorithm is developed to extract abstract hidden representation of system calls. This hidden representation constitutes the natural difference between attack and normal computer system behavior in real-time. Secondly, various supervised Machine Learning (ML) algorithms and normalizations are realized with proposed hidden representation of the system calls to evaluate the semi-supervised model in HADS. To evaluate the performance in terms of accuracy and processing time, the publicly available bench mark host based data sets: ADFA-LD and KDD 98 have been utilized. Each data set is the collection of traces of processes and each trace comprises of process's system calls. Experimental results shows that the suggested semi-supervised model outperforms existing methodologies in terms of accuracy and processing time for the detection of low and high foot print attacks.","PeriodicalId":278090,"journal":{"name":"2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCloud.2015.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

The generation of representative computer system behavior profile from system calls in LINUX environments to establish reliable Host Based Anomaly Detection Systems (HADS) against Next Generation of Attacks (NGA) is a challenge due to two major reasons. Firstly, NGA causes a low footprint upon host activities and consequently, attack activities are difficult to detect from normal computer processes in terms of accuracy and processing time. Secondly, there is no effective method to extract the natural difference from the two different types of traces (e.g. normal or abnormal) of system calls. Following these reasons, a semi-supervised model is proposed, which is comprised of two parts. Firstly, to establish an unsupervised computer behavior classification, an integer data zero-watermarking algorithm is developed to extract abstract hidden representation of system calls. This hidden representation constitutes the natural difference between attack and normal computer system behavior in real-time. Secondly, various supervised Machine Learning (ML) algorithms and normalizations are realized with proposed hidden representation of the system calls to evaluate the semi-supervised model in HADS. To evaluate the performance in terms of accuracy and processing time, the publicly available bench mark host based data sets: ADFA-LD and KDD 98 have been utilized. Each data set is the collection of traces of processes and each trace comprises of process's system calls. Experimental results shows that the suggested semi-supervised model outperforms existing methodologies in terms of accuracy and processing time for the detection of low and high foot print attacks.

查看原文本刊更多论文

基于主机的异常检测系统的整数数据零水印辅助系统调用抽象与规范化

从LINUX环境下的系统调用中生成具有代表性的计算机系统行为概要文件，以建立可靠的基于主机的异常检测系统(HADS)来抵御下一代攻击(NGA)是一项挑战，主要有两个原因。首先，NGA对主机活动的占用很小，因此，在准确性和处理时间方面，很难从正常的计算机进程中检测到攻击活动。其次，没有有效的方法从系统调用的两种不同类型的轨迹(如正常或异常)中提取自然差异。基于这些原因，本文提出了一个半监督模型，该模型由两部分组成。首先，为了建立无监督计算机行为分类，提出了一种整数数据零水印算法来提取系统调用的抽象隐藏表示。这种隐藏的表示构成了攻击与正常计算机系统实时行为之间的自然区别。其次，利用提出的系统调用的隐藏表示实现了各种监督机器学习算法和归一化，以评估HADS中的半监督模型。为了在准确性和处理时间方面评估性能，使用了公开可用的基于基准主机的数据集:ADFA-LD和KDD 98。每个数据集是进程跟踪的集合，每个跟踪由进程的系统调用组成。实验结果表明，所提出的半监督模型在检测低足迹攻击和高足迹攻击的准确率和处理时间上都优于现有的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing

自引率

0.00%

发文量