{"title":"From student research to intrusion detection","authors":"N. Paul Schembari","doi":"10.1145/2885990.2885995","DOIUrl":null,"url":null,"abstract":"We describe a multi-year project that began as mostly undergraduate student research in data mining applied to computer forensics and has now grown into a prototype for an intrusion detection system. The IDS assumes we have delimited data that can be separated into records such as IP packets, system calls, etc. The data mining approach uses the Bag of Words methodology where we form a matrix model of the data, and then cluster the records using k-means clustering and sparse nonnegative matrix factorization. With no training, these clusters are evaluated to determine if they represent normal system actions or attack vectors. This prototype system has accuracy levels similar to systems that use supervised learning on a specific set of data. We discuss future plans to make improvements with continued student investigation. Overall, we found this to be a great partnership between faculty and student research.","PeriodicalId":236418,"journal":{"name":"Proceedings of the 2015 Information Security Curriculum Development Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 Information Security Curriculum Development Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2885990.2885995","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We describe a multi-year project that began as mostly undergraduate student research in data mining applied to computer forensics and has now grown into a prototype for an intrusion detection system. The IDS assumes we have delimited data that can be separated into records such as IP packets, system calls, etc. The data mining approach uses the Bag of Words methodology where we form a matrix model of the data, and then cluster the records using k-means clustering and sparse nonnegative matrix factorization. With no training, these clusters are evaluated to determine if they represent normal system actions or attack vectors. This prototype system has accuracy levels similar to systems that use supervised learning on a specific set of data. We discuss future plans to make improvements with continued student investigation. Overall, we found this to be a great partnership between faculty and student research.