Toward generating a large-scale IoT-Zwave intrusion detection dataset: Smart device profiling, intruders behavior, and traffic characterization

IF 7.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Internet of Things Pub Date : 2025-10-10 DOI:10.1016/j.iot.2025.101747

MohammadMoein Shafi , Arash Habibi Lashkari

{"title":"Toward generating a large-scale IoT-Zwave intrusion detection dataset: Smart device profiling, intruders behavior, and traffic characterization","authors":"MohammadMoein Shafi , Arash Habibi Lashkari","doi":"10.1016/j.iot.2025.101747","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid expansion of the Internet of Things (IoT) has introduced critical security challenges, making IoT ecosystems a prime target for cyber threats. Traditional security measures, relying on predefined signatures and static rules, struggle to detect sophisticated attacks that evolve dynamically. While machine learning and deep learning have improved IoT security, their effectiveness is fundamentally limited by the quality and diversity of available datasets. Existing IoT security datasets suffer from numerous shortcomings, including limited device diversity, inadequate threat coverage, the absence of real-world user and environment interaction, a lack of IoT-specific attacks, insufficient data volume, outdated threat scenarios, a lack of multimodal data, and a lack of support for multi-protocol analysis. To bridge this gap, we conducted a comprehensive analysis of the top 30 publicly available IoT smart home datasets, identifying 22 critical shortcomings that hinder their applicability in security research. To address these limitations, we introduce BCCC-IoT-IDS-Zwave-2025, the most extensive and diverse IoT smart home dataset to date, developed over five months using a large-scale testbed comprising more than 50 IoT devices and encompassing over 80 distinct attack scenarios. Unlike prior datasets that focus primarily on IP network-layer traffic, our dataset integrates multi-source data, including IP-based network traffic, IoT-Zwave communication signals, device activity, and MQTT-based traffic and logs, with attack scenarios specifically designed for each data source, enabling a holistic view of IoT threats. To further enhance IoT threat analysis, we developed IoT-ZwaveNetLyzer, the first dedicated traffic analyzer for Z-Wave networks, addressing the gap left by traditional PC-focused tools. Extensive experimental evaluations demonstrate the dataset’s effectiveness, with state-of-the-art classifiers achieving an average detection accuracy exceeding 95% and a false positive rate as low as 2.2% on average, establishing BCCC-IoT-IDS-Zwave-2025 as a cornerstone for future IoT security research and the development of advanced detection methodologies.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"34 ","pages":"Article 101747"},"PeriodicalIF":7.6000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660525002616","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid expansion of the Internet of Things (IoT) has introduced critical security challenges, making IoT ecosystems a prime target for cyber threats. Traditional security measures, relying on predefined signatures and static rules, struggle to detect sophisticated attacks that evolve dynamically. While machine learning and deep learning have improved IoT security, their effectiveness is fundamentally limited by the quality and diversity of available datasets. Existing IoT security datasets suffer from numerous shortcomings, including limited device diversity, inadequate threat coverage, the absence of real-world user and environment interaction, a lack of IoT-specific attacks, insufficient data volume, outdated threat scenarios, a lack of multimodal data, and a lack of support for multi-protocol analysis. To bridge this gap, we conducted a comprehensive analysis of the top 30 publicly available IoT smart home datasets, identifying 22 critical shortcomings that hinder their applicability in security research. To address these limitations, we introduce BCCC-IoT-IDS-Zwave-2025, the most extensive and diverse IoT smart home dataset to date, developed over five months using a large-scale testbed comprising more than 50 IoT devices and encompassing over 80 distinct attack scenarios. Unlike prior datasets that focus primarily on IP network-layer traffic, our dataset integrates multi-source data, including IP-based network traffic, IoT-Zwave communication signals, device activity, and MQTT-based traffic and logs, with attack scenarios specifically designed for each data source, enabling a holistic view of IoT threats. To further enhance IoT threat analysis, we developed IoT-ZwaveNetLyzer, the first dedicated traffic analyzer for Z-Wave networks, addressing the gap left by traditional PC-focused tools. Extensive experimental evaluations demonstrate the dataset’s effectiveness, with state-of-the-art classifiers achieving an average detection accuracy exceeding 95% and a false positive rate as low as 2.2% on average, establishing BCCC-IoT-IDS-Zwave-2025 as a cornerstone for future IoT security research and the development of advanced detection methodologies.

查看原文本刊更多论文

生成大规模IoT-Zwave入侵检测数据集：智能设备分析，入侵者行为和流量表征

物联网（IoT）的快速扩张带来了严峻的安全挑战，使物联网生态系统成为网络威胁的主要目标。传统的安全措施依赖于预定义的签名和静态规则，很难检测到动态演变的复杂攻击。虽然机器学习和深度学习提高了物联网的安全性，但它们的有效性从根本上受到可用数据集的质量和多样性的限制。现有的物联网安全数据集存在许多缺点，包括设备多样性有限、威胁覆盖范围不足、缺乏真实用户和环境交互、缺乏物联网特定攻击、数据量不足、过时的威胁场景、缺乏多模式数据以及缺乏对多协议分析的支持。为了弥补这一差距，我们对30个公开可用的物联网智能家居数据集进行了全面分析，确定了22个阻碍其在安全研究中的适用性的关键缺陷。为了解决这些限制，我们引入了bcc -IoT- ids - zwave -2025，这是迄今为止最广泛和最多样化的物联网智能家居数据集，使用包含50多个物联网设备和80多个不同攻击场景的大型测试平台开发了五个多月。与之前主要关注IP网络层流量的数据集不同，我们的数据集集成了多源数据，包括基于IP的网络流量、IoT- zwave通信信号、设备活动和基于mqtt的流量和日志，并为每个数据源专门设计了攻击场景，从而能够全面了解物联网威胁。为了进一步加强物联网威胁分析，我们开发了IoT- zwavenetlyzer，这是第一个用于Z-Wave网络的专用流量分析仪，解决了传统的以pc为中心的工具留下的空白。广泛的实验评估证明了数据集的有效性，最先进的分类器平均检测准确率超过95%，假阳性率平均低至2.2%，将BCCC-IoT-IDS-Zwave-2025确立为未来物联网安全研究和先进检测方法开发的基石。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Internet of Things Multiple-

CiteScore

3.60

自引率

5.10%

发文量

115

审稿时长

37 days

期刊介绍： Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT. The journal will place a high priority on timely publication, and provide a home for high quality. Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.