Bernhard Brenner , Joachim Fabini , Magnus Offermanns , Sabrina Semper , Tanja Zseby
{"title":"Malware communication in smart factories: A network traffic data set","authors":"Bernhard Brenner , Joachim Fabini , Magnus Offermanns , Sabrina Semper , Tanja Zseby","doi":"10.1016/j.comnet.2024.110804","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning-based intrusion detection requires suitable and realistic data sets for training and testing. However, data sets that originate from real networks are rare. Network data is considered privacy sensitive and the purposeful introduction of malicious traffic is usually not possible. In this paper we introduce a labeled data set captured at a smart factory located in Vienna, Austria during normal operation and during penetration tests with different attack types. The data set consists of 173 GB of Packet Capture (PCAP) files, which represent 16 days (395 h) of factory operation. It includes Message Queuing Telemetry Transport (MQTT), OPC Unified Architecture (OPC UA), and Modbus/TCP traffic. The captured malicious traffic was originated by a professional penetration tester who performed two types of attacks: (a) aggressive attacks that are easier to detect and (b) stealthy attacks that are harder to detect. Our data set includes the raw PCAP files and extracted flow data. Labels for packets and flows indicate whether packets (or flows) originated from a specific attack or from benign communication. We describe the methodology for creating the data set, conduct an analysis of the data and provide detailed information about the recorded traffic itself. The data set is freely available to support reproducible research and the comparability of results in the area of intrusion detection in industrial networks.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128624006364","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning-based intrusion detection requires suitable and realistic data sets for training and testing. However, data sets that originate from real networks are rare. Network data is considered privacy sensitive and the purposeful introduction of malicious traffic is usually not possible. In this paper we introduce a labeled data set captured at a smart factory located in Vienna, Austria during normal operation and during penetration tests with different attack types. The data set consists of 173 GB of Packet Capture (PCAP) files, which represent 16 days (395 h) of factory operation. It includes Message Queuing Telemetry Transport (MQTT), OPC Unified Architecture (OPC UA), and Modbus/TCP traffic. The captured malicious traffic was originated by a professional penetration tester who performed two types of attacks: (a) aggressive attacks that are easier to detect and (b) stealthy attacks that are harder to detect. Our data set includes the raw PCAP files and extracted flow data. Labels for packets and flows indicate whether packets (or flows) originated from a specific attack or from benign communication. We describe the methodology for creating the data set, conduct an analysis of the data and provide detailed information about the recorded traffic itself. The data set is freely available to support reproducible research and the comparability of results in the area of intrusion detection in industrial networks.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.