{"title":"Cyber Threat Hunting Through the Use of an Isolation Forest","authors":"D. Karev, Christopher B. McCubbin, R. Vaulin","doi":"10.1145/3134302.3134319","DOIUrl":null,"url":null,"abstract":"Most intrusion detection systems use supervised machine learning algorithms which allow them to detect only recorded types of malicious attacks. This paper applies a fundamentally different approach to the problem, exploiting Isolation Forests, an unsupervised machine learning algorithm in a new context. One of the most important advantages of the algorithm is that it can identify and record novel intrusion models. We conduct experiments using HTTP log data to explore the algorithm's accuracy under various conditions. We empirically determine the optimal values for the algorithm's parameters and prove that the originally suggested standard Isolation Forest's parameters do not always produce optimal results. Furthermore, we explore which HTTP features achieve the best results for differentiating between malicious and normal data by running a genetic algorithm. After applying the established results, we achieve approximately 300% increase in the accuracy and we decrease the requested time of the algorithm by nearly 50%.","PeriodicalId":131196,"journal":{"name":"Proceedings of the 18th International Conference on Computer Systems and Technologies","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Computer Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3134302.3134319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Most intrusion detection systems use supervised machine learning algorithms which allow them to detect only recorded types of malicious attacks. This paper applies a fundamentally different approach to the problem, exploiting Isolation Forests, an unsupervised machine learning algorithm in a new context. One of the most important advantages of the algorithm is that it can identify and record novel intrusion models. We conduct experiments using HTTP log data to explore the algorithm's accuracy under various conditions. We empirically determine the optimal values for the algorithm's parameters and prove that the originally suggested standard Isolation Forest's parameters do not always produce optimal results. Furthermore, we explore which HTTP features achieve the best results for differentiating between malicious and normal data by running a genetic algorithm. After applying the established results, we achieve approximately 300% increase in the accuracy and we decrease the requested time of the algorithm by nearly 50%.