Synthetic Network Packet Data Generator With Protocol State and Arrival Timing Awareness

IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yukito Onodera;Erina Takeshita;Tomoya Kosugi;Takashi Nakanishi;Tatsuya Shimada
{"title":"Synthetic Network Packet Data Generator With Protocol State and Arrival Timing Awareness","authors":"Yukito Onodera;Erina Takeshita;Tomoya Kosugi;Takashi Nakanishi;Tatsuya Shimada","doi":"10.1109/ACCESS.2025.3604773","DOIUrl":null,"url":null,"abstract":"Recent advances in network traffic analysis use machine learning (ML) on packet-level data, but obtaining large amount of real data for training ML is becoming a challenge. ML-based generators like Generative Adversarial Networks and Variational Autoencoders are promising candidates for generation the packet-level data but often fail to preserve TCP protocol states and packet arrival time. Without accurately reproducing these characteristics, the synthetic data may fail to reflect real network behavior, potentially leading to erroneous conclusions in the training and evaluation of machine learning models. This paper proposes a new method to synthesize realistic packet-level data with accurate protocol transitions and timing. By separating IP address and port number generation and aligning timing with handshake completion, the model avoids overfitting. Using the CIC-IDS 2017 dataset, our method better replicates headers and timing, producing synthetic data that closely resembles real data features distribution. Visual and statistical comparisons reveal that the synthetic data closely resembles real packet traces, providing a reliable alternative for training and evaluating network analysis.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"153398-153405"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11146712","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11146712/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advances in network traffic analysis use machine learning (ML) on packet-level data, but obtaining large amount of real data for training ML is becoming a challenge. ML-based generators like Generative Adversarial Networks and Variational Autoencoders are promising candidates for generation the packet-level data but often fail to preserve TCP protocol states and packet arrival time. Without accurately reproducing these characteristics, the synthetic data may fail to reflect real network behavior, potentially leading to erroneous conclusions in the training and evaluation of machine learning models. This paper proposes a new method to synthesize realistic packet-level data with accurate protocol transitions and timing. By separating IP address and port number generation and aligning timing with handshake completion, the model avoids overfitting. Using the CIC-IDS 2017 dataset, our method better replicates headers and timing, producing synthetic data that closely resembles real data features distribution. Visual and statistical comparisons reveal that the synthetic data closely resembles real packet traces, providing a reliable alternative for training and evaluating network analysis.
具有协议状态和到达时间感知的合成网络分组数据发生器
网络流量分析的最新进展是将机器学习(ML)用于数据包级数据,但是获取大量真实数据来训练ML正在成为一个挑战。基于ml的生成器,如生成对抗网络和变分自编码器,是生成包级数据的有希望的候选,但通常不能保留TCP协议状态和包到达时间。如果不能准确地再现这些特征,合成数据可能无法反映真实的网络行为,从而可能导致机器学习模型的训练和评估中得出错误的结论。本文提出了一种新的方法来合成具有精确协议转换和定时的真实包级数据。通过分离IP地址和端口号的生成,并调整握手完成的时间,该模型避免了过拟合。使用CIC-IDS 2017数据集,我们的方法更好地复制了标头和时间,生成了与真实数据特征分布非常相似的合成数据。视觉和统计比较表明,合成数据与真实数据包轨迹非常相似,为训练和评估网络分析提供了可靠的替代方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Access
IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍: IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信