{"title":"Adaptive header identification and unsupervised clustering strategy for enhanced protocol reverse engineering","authors":"Mingliang Zhu, Chunxiang Gu, Xieli Zhang, Qingjun Yuan, Mengcheng Ju, Guanping Zhang, Xi Chen","doi":"10.1016/j.eswa.2025.128467","DOIUrl":null,"url":null,"abstract":"<div><div>Protocol reverse engineering is critical for ensuring network security and understanding proprietary communication mechanisms. Most traditional network trace-based methods face challenges such as high computational complexity, excessive memory usage, and sensitivity to payload variations. In this paper, we propose a method that integrates adaptive message header recognition with unsupervised clustering strategies for protocol reverse engineering. Utilizing mean entropy change and change point detection algorithms, our method automatically identifies message headers, reducing the impact of payload variations on similarity measurements. Following this, our method significantly reduces computational resource consumption while maintaining clustering performance, by clustering based on a small set of selected core samples of message headers and assigning the remaining samples to existing categories. Moreover, leveraging the identified message headers, we incorporate a hierarchical format inference technique and design a function code field detector, which enhances the accuracy and efficiency of protocol reverse engineering. Our evaluation across eight widely used protocols demonstrates that our method achieves homogeneity and completeness scores of 0.94 and 0.74, respectively, in message type identification. These results significantly outperform existing protocol reverse engineering tools on the same datasets: MFD&DBSCAN (0.31, 0.73), NEMETYL (0.73, 0.64), and Netzob (0.34, 0.76). Furthermore, our method achieves a perfection score <span><math><mrow><mn>1.2</mn><mo>×</mo></mrow></math></span> higher than Binaryinferno in format inference.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"291 ","pages":"Article 128467"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095741742502086X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Protocol reverse engineering is critical for ensuring network security and understanding proprietary communication mechanisms. Most traditional network trace-based methods face challenges such as high computational complexity, excessive memory usage, and sensitivity to payload variations. In this paper, we propose a method that integrates adaptive message header recognition with unsupervised clustering strategies for protocol reverse engineering. Utilizing mean entropy change and change point detection algorithms, our method automatically identifies message headers, reducing the impact of payload variations on similarity measurements. Following this, our method significantly reduces computational resource consumption while maintaining clustering performance, by clustering based on a small set of selected core samples of message headers and assigning the remaining samples to existing categories. Moreover, leveraging the identified message headers, we incorporate a hierarchical format inference technique and design a function code field detector, which enhances the accuracy and efficiency of protocol reverse engineering. Our evaluation across eight widely used protocols demonstrates that our method achieves homogeneity and completeness scores of 0.94 and 0.74, respectively, in message type identification. These results significantly outperform existing protocol reverse engineering tools on the same datasets: MFD&DBSCAN (0.31, 0.73), NEMETYL (0.73, 0.64), and Netzob (0.34, 0.76). Furthermore, our method achieves a perfection score higher than Binaryinferno in format inference.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.