Converting PCAPs into Weka mineable data

15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) Pub Date : 2014-09-01 DOI:10.1109/SNPD.2014.6888681

C. A. Fowler, R. Hammell

{"title":"Converting PCAPs into Weka mineable data","authors":"C. A. Fowler, R. Hammell","doi":"10.1109/SNPD.2014.6888681","DOIUrl":null,"url":null,"abstract":"In today's world there is an unprecedented volume of information available to organizations of all sizes; the “information overload” problem is well documented. This problem is especially challenging in the world of network intrusion detection. In this realm, we must not only deal with sifting through vast amounts of data, but we must also do it in a timely manner even when at times we are not sure what exactly it is we are trying to find. In the grander scheme of our work we intend to demonstrate that several different data mining algorithms reporting to an overarching layer will yield more accurate results than anyone data mining application (or algorithm) acting on its own. The system will operate in the domain of offline network and computer forensic data mining, under the guidance of a hybrid intelligence/multi-agent, systems based, for interpretation and interpolation of the findings. Toward that end, in this paper we build upon earlier work, undertaking the steps required for generating and preparing suitably minable data. Specifically, we are concerned with extracting as much useful data as possible out of a PCAP (Packet capture) for importing into Weka. While a PCAP may have thousands of field/value pairs, Wireshark and tshark's csv (comma separated value) output module only renders a small percentage of these fields and their values by default. We introduce a tool of our own making which enumerates every field (with or without a value) in any PCAP and generates an ARFF (Attribute-Relation File Format - Weka default). This code represents a component of a larger application we are designing (future work) which will ingest a PCAP, semi-autonomously preprocess it and feed it into Weka for processing/mining using several different algorithms.","PeriodicalId":272932,"journal":{"name":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2014.6888681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

In today's world there is an unprecedented volume of information available to organizations of all sizes; the “information overload” problem is well documented. This problem is especially challenging in the world of network intrusion detection. In this realm, we must not only deal with sifting through vast amounts of data, but we must also do it in a timely manner even when at times we are not sure what exactly it is we are trying to find. In the grander scheme of our work we intend to demonstrate that several different data mining algorithms reporting to an overarching layer will yield more accurate results than anyone data mining application (or algorithm) acting on its own. The system will operate in the domain of offline network and computer forensic data mining, under the guidance of a hybrid intelligence/multi-agent, systems based, for interpretation and interpolation of the findings. Toward that end, in this paper we build upon earlier work, undertaking the steps required for generating and preparing suitably minable data. Specifically, we are concerned with extracting as much useful data as possible out of a PCAP (Packet capture) for importing into Weka. While a PCAP may have thousands of field/value pairs, Wireshark and tshark's csv (comma separated value) output module only renders a small percentage of these fields and their values by default. We introduce a tool of our own making which enumerates every field (with or without a value) in any PCAP and generates an ARFF (Attribute-Relation File Format - Weka default). This code represents a component of a larger application we are designing (future work) which will ingest a PCAP, semi-autonomously preprocess it and feed it into Weka for processing/mining using several different algorithms.

查看原文本刊更多论文

将pcap转换为Weka可挖掘的数据

在当今世界，各种规模的组织都可以获得前所未有的信息量;“信息过载”的问题是有据可查的。这个问题在网络入侵检测领域尤其具有挑战性。在这个领域，我们不仅要处理筛选大量的数据，而且我们还必须及时地进行筛选，即使有时我们不确定我们要寻找的究竟是什么。在我们工作的宏伟计划中，我们打算证明，向一个总体层报告的几种不同的数据挖掘算法将产生比任何单独运行的数据挖掘应用程序(或算法)更准确的结果。该系统将在离线网络和计算机取证数据挖掘领域运行，在混合智能/多代理的指导下，以系统为基础，对调查结果进行解释和插值。为此，在本文中，我们以早期的工作为基础，采取生成和准备适当可挖掘数据所需的步骤。具体来说，我们关心的是从PCAP(数据包捕获)中提取尽可能多的有用数据，以便导入到Weka中。虽然PCAP可能有数千个字段/值对，但Wireshark和tshark的csv(逗号分隔值)输出模块在默认情况下只呈现这些字段及其值的一小部分。我们引入了一个自己制作的工具，它可以枚举任何PCAP中的每个字段(带值或不带值)，并生成一个ARFF(属性关系文件格式- Weka默认)。这段代码代表了我们正在设计的一个更大的应用程序的一个组件(未来的工作)，它将获取一个PCAP，对其进行半自动预处理，并将其提供给Weka，以便使用几种不同的算法进行处理/挖掘。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

自引率

0.00%

发文量