Synthetic Network Packet Data Generator With Protocol State and Arrival Timing Awareness

IF 3.6 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2025-09-02 DOI:10.1109/ACCESS.2025.3604773

Yukito Onodera;Erina Takeshita;Tomoya Kosugi;Takashi Nakanishi;Tatsuya Shimada

{"title":"Synthetic Network Packet Data Generator With Protocol State and Arrival Timing Awareness","authors":"Yukito Onodera;Erina Takeshita;Tomoya Kosugi;Takashi Nakanishi;Tatsuya Shimada","doi":"10.1109/ACCESS.2025.3604773","DOIUrl":null,"url":null,"abstract":"Recent advances in network traffic analysis use machine learning (ML) on packet-level data, but obtaining large amount of real data for training ML is becoming a challenge. ML-based generators like Generative Adversarial Networks and Variational Autoencoders are promising candidates for generation the packet-level data but often fail to preserve TCP protocol states and packet arrival time. Without accurately reproducing these characteristics, the synthetic data may fail to reflect real network behavior, potentially leading to erroneous conclusions in the training and evaluation of machine learning models. This paper proposes a new method to synthesize realistic packet-level data with accurate protocol transitions and timing. By separating IP address and port number generation and aligning timing with handshake completion, the model avoids overfitting. Using the CIC-IDS 2017 dataset, our method better replicates headers and timing, producing synthetic data that closely resembles real data features distribution. Visual and statistical comparisons reveal that the synthetic data closely resembles real packet traces, providing a reliable alternative for training and evaluating network analysis.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"153398-153405"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11146712","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11146712/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in network traffic analysis use machine learning (ML) on packet-level data, but obtaining large amount of real data for training ML is becoming a challenge. ML-based generators like Generative Adversarial Networks and Variational Autoencoders are promising candidates for generation the packet-level data but often fail to preserve TCP protocol states and packet arrival time. Without accurately reproducing these characteristics, the synthetic data may fail to reflect real network behavior, potentially leading to erroneous conclusions in the training and evaluation of machine learning models. This paper proposes a new method to synthesize realistic packet-level data with accurate protocol transitions and timing. By separating IP address and port number generation and aligning timing with handshake completion, the model avoids overfitting. Using the CIC-IDS 2017 dataset, our method better replicates headers and timing, producing synthetic data that closely resembles real data features distribution. Visual and statistical comparisons reveal that the synthetic data closely resembles real packet traces, providing a reliable alternative for training and evaluating network analysis.

查看原文本刊更多论文

具有协议状态和到达时间感知的合成网络分组数据发生器

网络流量分析的最新进展是将机器学习（ML）用于数据包级数据，但是获取大量真实数据来训练ML正在成为一个挑战。基于ml的生成器，如生成对抗网络和变分自编码器，是生成包级数据的有希望的候选，但通常不能保留TCP协议状态和包到达时间。如果不能准确地再现这些特征，合成数据可能无法反映真实的网络行为，从而可能导致机器学习模型的训练和评估中得出错误的结论。本文提出了一种新的方法来合成具有精确协议转换和定时的真实包级数据。通过分离IP地址和端口号的生成，并调整握手完成的时间，该模型避免了过拟合。使用CIC-IDS 2017数据集，我们的方法更好地复制了标头和时间，生成了与真实数据特征分布非常相似的合成数据。视觉和统计比较表明，合成数据与真实数据包轨迹非常相似，为训练和评估网络分析提供了可靠的替代方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.