{"title":"Converting PCAPs into Weka mineable data","authors":"C. A. Fowler, R. Hammell","doi":"10.1109/SNPD.2014.6888681","DOIUrl":null,"url":null,"abstract":"In today's world there is an unprecedented volume of information available to organizations of all sizes; the “information overload” problem is well documented. This problem is especially challenging in the world of network intrusion detection. In this realm, we must not only deal with sifting through vast amounts of data, but we must also do it in a timely manner even when at times we are not sure what exactly it is we are trying to find. In the grander scheme of our work we intend to demonstrate that several different data mining algorithms reporting to an overarching layer will yield more accurate results than anyone data mining application (or algorithm) acting on its own. The system will operate in the domain of offline network and computer forensic data mining, under the guidance of a hybrid intelligence/multi-agent, systems based, for interpretation and interpolation of the findings. Toward that end, in this paper we build upon earlier work, undertaking the steps required for generating and preparing suitably minable data. Specifically, we are concerned with extracting as much useful data as possible out of a PCAP (Packet capture) for importing into Weka. While a PCAP may have thousands of field/value pairs, Wireshark and tshark's csv (comma separated value) output module only renders a small percentage of these fields and their values by default. We introduce a tool of our own making which enumerates every field (with or without a value) in any PCAP and generates an ARFF (Attribute-Relation File Format - Weka default). This code represents a component of a larger application we are designing (future work) which will ingest a PCAP, semi-autonomously preprocess it and feed it into Weka for processing/mining using several different algorithms.","PeriodicalId":272932,"journal":{"name":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2014.6888681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
In today's world there is an unprecedented volume of information available to organizations of all sizes; the “information overload” problem is well documented. This problem is especially challenging in the world of network intrusion detection. In this realm, we must not only deal with sifting through vast amounts of data, but we must also do it in a timely manner even when at times we are not sure what exactly it is we are trying to find. In the grander scheme of our work we intend to demonstrate that several different data mining algorithms reporting to an overarching layer will yield more accurate results than anyone data mining application (or algorithm) acting on its own. The system will operate in the domain of offline network and computer forensic data mining, under the guidance of a hybrid intelligence/multi-agent, systems based, for interpretation and interpolation of the findings. Toward that end, in this paper we build upon earlier work, undertaking the steps required for generating and preparing suitably minable data. Specifically, we are concerned with extracting as much useful data as possible out of a PCAP (Packet capture) for importing into Weka. While a PCAP may have thousands of field/value pairs, Wireshark and tshark's csv (comma separated value) output module only renders a small percentage of these fields and their values by default. We introduce a tool of our own making which enumerates every field (with or without a value) in any PCAP and generates an ARFF (Attribute-Relation File Format - Weka default). This code represents a component of a larger application we are designing (future work) which will ingest a PCAP, semi-autonomously preprocess it and feed it into Weka for processing/mining using several different algorithms.