{"title":"Synthetic Network Packet Data Generator With Protocol State and Arrival Timing Awareness","authors":"Yukito Onodera;Erina Takeshita;Tomoya Kosugi;Takashi Nakanishi;Tatsuya Shimada","doi":"10.1109/ACCESS.2025.3604773","DOIUrl":null,"url":null,"abstract":"Recent advances in network traffic analysis use machine learning (ML) on packet-level data, but obtaining large amount of real data for training ML is becoming a challenge. ML-based generators like Generative Adversarial Networks and Variational Autoencoders are promising candidates for generation the packet-level data but often fail to preserve TCP protocol states and packet arrival time. Without accurately reproducing these characteristics, the synthetic data may fail to reflect real network behavior, potentially leading to erroneous conclusions in the training and evaluation of machine learning models. This paper proposes a new method to synthesize realistic packet-level data with accurate protocol transitions and timing. By separating IP address and port number generation and aligning timing with handshake completion, the model avoids overfitting. Using the CIC-IDS 2017 dataset, our method better replicates headers and timing, producing synthetic data that closely resembles real data features distribution. Visual and statistical comparisons reveal that the synthetic data closely resembles real packet traces, providing a reliable alternative for training and evaluating network analysis.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"153398-153405"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11146712","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11146712/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advances in network traffic analysis use machine learning (ML) on packet-level data, but obtaining large amount of real data for training ML is becoming a challenge. ML-based generators like Generative Adversarial Networks and Variational Autoencoders are promising candidates for generation the packet-level data but often fail to preserve TCP protocol states and packet arrival time. Without accurately reproducing these characteristics, the synthetic data may fail to reflect real network behavior, potentially leading to erroneous conclusions in the training and evaluation of machine learning models. This paper proposes a new method to synthesize realistic packet-level data with accurate protocol transitions and timing. By separating IP address and port number generation and aligning timing with handshake completion, the model avoids overfitting. Using the CIC-IDS 2017 dataset, our method better replicates headers and timing, producing synthetic data that closely resembles real data features distribution. Visual and statistical comparisons reveal that the synthetic data closely resembles real packet traces, providing a reliable alternative for training and evaluating network analysis.
IEEE AccessCOMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍:
IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest.
IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on:
Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals.
Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering.
Development of new or improved fabrication or manufacturing techniques.
Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.