Adaptive header identification and unsupervised clustering strategy for enhanced protocol reverse engineering

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-06-14 DOI:10.1016/j.eswa.2025.128467

Mingliang Zhu, Chunxiang Gu, Xieli Zhang, Qingjun Yuan, Mengcheng Ju, Guanping Zhang, Xi Chen

{"title":"Adaptive header identification and unsupervised clustering strategy for enhanced protocol reverse engineering","authors":"Mingliang Zhu, Chunxiang Gu, Xieli Zhang, Qingjun Yuan, Mengcheng Ju, Guanping Zhang, Xi Chen","doi":"10.1016/j.eswa.2025.128467","DOIUrl":null,"url":null,"abstract":"<div><div>Protocol reverse engineering is critical for ensuring network security and understanding proprietary communication mechanisms. Most traditional network trace-based methods face challenges such as high computational complexity, excessive memory usage, and sensitivity to payload variations. In this paper, we propose a method that integrates adaptive message header recognition with unsupervised clustering strategies for protocol reverse engineering. Utilizing mean entropy change and change point detection algorithms, our method automatically identifies message headers, reducing the impact of payload variations on similarity measurements. Following this, our method significantly reduces computational resource consumption while maintaining clustering performance, by clustering based on a small set of selected core samples of message headers and assigning the remaining samples to existing categories. Moreover, leveraging the identified message headers, we incorporate a hierarchical format inference technique and design a function code field detector, which enhances the accuracy and efficiency of protocol reverse engineering. Our evaluation across eight widely used protocols demonstrates that our method achieves homogeneity and completeness scores of 0.94 and 0.74, respectively, in message type identification. These results significantly outperform existing protocol reverse engineering tools on the same datasets: MFD&DBSCAN (0.31, 0.73), NEMETYL (0.73, 0.64), and Netzob (0.34, 0.76). Furthermore, our method achieves a perfection score <span><math><mrow><mn>1.2</mn><mo>×</mo></mrow></math></span> higher than Binaryinferno in format inference.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"291 ","pages":"Article 128467"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095741742502086X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Protocol reverse engineering is critical for ensuring network security and understanding proprietary communication mechanisms. Most traditional network trace-based methods face challenges such as high computational complexity, excessive memory usage, and sensitivity to payload variations. In this paper, we propose a method that integrates adaptive message header recognition with unsupervised clustering strategies for protocol reverse engineering. Utilizing mean entropy change and change point detection algorithms, our method automatically identifies message headers, reducing the impact of payload variations on similarity measurements. Following this, our method significantly reduces computational resource consumption while maintaining clustering performance, by clustering based on a small set of selected core samples of message headers and assigning the remaining samples to existing categories. Moreover, leveraging the identified message headers, we incorporate a hierarchical format inference technique and design a function code field detector, which enhances the accuracy and efficiency of protocol reverse engineering. Our evaluation across eight widely used protocols demonstrates that our method achieves homogeneity and completeness scores of 0.94 and 0.74, respectively, in message type identification. These results significantly outperform existing protocol reverse engineering tools on the same datasets: MFD&DBSCAN (0.31, 0.73), NEMETYL (0.73, 0.64), and Netzob (0.34, 0.76). Furthermore, our method achieves a perfection score

1.2 \times

higher than Binaryinferno in format inference.

查看原文本刊更多论文

增强协议逆向工程的自适应报头识别和无监督聚类策略

协议逆向工程对于确保网络安全和理解专有通信机制至关重要。大多数传统的基于网络跟踪的方法都面临着计算复杂度高、内存使用过多以及对负载变化敏感等挑战。本文提出了一种将自适应消息头识别与无监督聚类策略相结合的协议逆向工程方法。利用平均熵变化和变化点检测算法，我们的方法自动识别消息头，减少了有效载荷变化对相似性测量的影响。在此之后，我们的方法通过基于一小组选定的消息头核心样本进行聚类并将剩余样本分配给现有类别，从而在保持聚类性能的同时显著减少了计算资源消耗。此外，利用已识别的消息头，结合分层格式推断技术，设计了功能码域检测器，提高了协议逆向工程的准确性和效率。我们对八个广泛使用的协议的评估表明，我们的方法在消息类型识别方面的同质性和完整性得分分别为0.94和0.74。这些结果在相同的数据集上显著优于现有的协议逆向工程工具：mfddbscan (0.31, 0.73)， NEMETYL（0.73, 0.64）和Netzob（0.34, 0.76）。此外，我们的方法在格式推理方面达到了比Binaryinferno高1.2倍的完美分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.