Blind protocol identification using synthetic dataset: A case study on geographic protocols

IF 2 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Mohammad Abbasi-Azar , Mehdi Teimouri , Mohsen Nikray
{"title":"Blind protocol identification using synthetic dataset: A case study on geographic protocols","authors":"Mohammad Abbasi-Azar ,&nbsp;Mehdi Teimouri ,&nbsp;Mohsen Nikray","doi":"10.1016/j.fsidi.2025.301911","DOIUrl":null,"url":null,"abstract":"<div><div>Network forensics faces major challenges, including increasingly sophisticated cyberattacks and the difficulty of obtaining labeled datasets for training AI-driven security tools. Blind Protocol Identification (BPI), essential for detecting covert data transfers, is particularly impacted by these data limitations. This paper introduces a novel and inherently scalable method for generating synthetic datasets tailored for BPI in network forensics. Our approach emphasizes feature engineering and a statistical-analytical model of feature distributions to address the scarcity and imbalance of labeled data. We demonstrate the effectiveness of this method through a case study on geographic protocols, where we train Random Forest models using only synthetic datasets and evaluate their performance on real-world traffic. This work presents a promising solution to the data challenges in BPI, enabling reliable protocol identification while maintaining data privacy and overcoming traditional data collection limitations.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301911"},"PeriodicalIF":2.0000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281725000502","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Network forensics faces major challenges, including increasingly sophisticated cyberattacks and the difficulty of obtaining labeled datasets for training AI-driven security tools. Blind Protocol Identification (BPI), essential for detecting covert data transfers, is particularly impacted by these data limitations. This paper introduces a novel and inherently scalable method for generating synthetic datasets tailored for BPI in network forensics. Our approach emphasizes feature engineering and a statistical-analytical model of feature distributions to address the scarcity and imbalance of labeled data. We demonstrate the effectiveness of this method through a case study on geographic protocols, where we train Random Forest models using only synthetic datasets and evaluate their performance on real-world traffic. This work presents a promising solution to the data challenges in BPI, enabling reliable protocol identification while maintaining data privacy and overcoming traditional data collection limitations.
基于合成数据集的协议盲识别:以地理协议为例
网络取证面临着重大挑战,包括日益复杂的网络攻击,以及难以获得标记数据集来训练人工智能驱动的安全工具。盲协议识别(BPI)对于检测隐蔽数据传输至关重要,尤其受到这些数据限制的影响。本文介绍了一种新颖且具有固有可扩展性的方法,用于生成针对网络取证中BPI定制的合成数据集。我们的方法强调特征工程和特征分布的统计分析模型,以解决标记数据的稀缺性和不平衡性。我们通过地理协议的案例研究证明了这种方法的有效性,其中我们仅使用合成数据集训练随机森林模型,并评估其在现实世界流量中的性能。这项工作为BPI中的数据挑战提供了一个有希望的解决方案,在保持数据隐私和克服传统数据收集限制的同时实现可靠的协议识别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.90
自引率
15.00%
发文量
87
审稿时长
76 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信