Mohammad Abbasi-Azar , Mehdi Teimouri , Mohsen Nikray
{"title":"基于合成数据集的协议盲识别:以地理协议为例","authors":"Mohammad Abbasi-Azar , Mehdi Teimouri , Mohsen Nikray","doi":"10.1016/j.fsidi.2025.301911","DOIUrl":null,"url":null,"abstract":"<div><div>Network forensics faces major challenges, including increasingly sophisticated cyberattacks and the difficulty of obtaining labeled datasets for training AI-driven security tools. Blind Protocol Identification (BPI), essential for detecting covert data transfers, is particularly impacted by these data limitations. This paper introduces a novel and inherently scalable method for generating synthetic datasets tailored for BPI in network forensics. Our approach emphasizes feature engineering and a statistical-analytical model of feature distributions to address the scarcity and imbalance of labeled data. We demonstrate the effectiveness of this method through a case study on geographic protocols, where we train Random Forest models using only synthetic datasets and evaluate their performance on real-world traffic. This work presents a promising solution to the data challenges in BPI, enabling reliable protocol identification while maintaining data privacy and overcoming traditional data collection limitations.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301911"},"PeriodicalIF":2.0000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Blind protocol identification using synthetic dataset: A case study on geographic protocols\",\"authors\":\"Mohammad Abbasi-Azar , Mehdi Teimouri , Mohsen Nikray\",\"doi\":\"10.1016/j.fsidi.2025.301911\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Network forensics faces major challenges, including increasingly sophisticated cyberattacks and the difficulty of obtaining labeled datasets for training AI-driven security tools. Blind Protocol Identification (BPI), essential for detecting covert data transfers, is particularly impacted by these data limitations. This paper introduces a novel and inherently scalable method for generating synthetic datasets tailored for BPI in network forensics. Our approach emphasizes feature engineering and a statistical-analytical model of feature distributions to address the scarcity and imbalance of labeled data. We demonstrate the effectiveness of this method through a case study on geographic protocols, where we train Random Forest models using only synthetic datasets and evaluate their performance on real-world traffic. This work presents a promising solution to the data challenges in BPI, enabling reliable protocol identification while maintaining data privacy and overcoming traditional data collection limitations.</div></div>\",\"PeriodicalId\":48481,\"journal\":{\"name\":\"Forensic Science International-Digital Investigation\",\"volume\":\"53 \",\"pages\":\"Article 301911\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-03-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forensic Science International-Digital Investigation\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666281725000502\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281725000502","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Blind protocol identification using synthetic dataset: A case study on geographic protocols
Network forensics faces major challenges, including increasingly sophisticated cyberattacks and the difficulty of obtaining labeled datasets for training AI-driven security tools. Blind Protocol Identification (BPI), essential for detecting covert data transfers, is particularly impacted by these data limitations. This paper introduces a novel and inherently scalable method for generating synthetic datasets tailored for BPI in network forensics. Our approach emphasizes feature engineering and a statistical-analytical model of feature distributions to address the scarcity and imbalance of labeled data. We demonstrate the effectiveness of this method through a case study on geographic protocols, where we train Random Forest models using only synthetic datasets and evaluate their performance on real-world traffic. This work presents a promising solution to the data challenges in BPI, enabling reliable protocol identification while maintaining data privacy and overcoming traditional data collection limitations.