Degang Li , Xi Chen, Mingliang Zhu, Qingjun Yuan, Chunxiang Gu
{"title":"基于消息表示和标识符指纹的细粒度消息聚类方法","authors":"Degang Li , Xi Chen, Mingliang Zhu, Qingjun Yuan, Chunxiang Gu","doi":"10.1016/j.cose.2025.104631","DOIUrl":null,"url":null,"abstract":"<div><div>Protocol reverse engineering is a critical technique for analyzing private protocols and unknown protocols. Message clustering is a foundational element of protocol reverse engineering, playing a key role in traffic classification and format inference. In this paper, we propose a fine-grained unknown message clustering method, termed FG-MCRF. FG-MCRF extracts deep representation vectors from the raw message data by constructing a representation network with low information loss and constructs high-purity message clusters based on representation vectors. The FG-MCRF method constructs high-precision global message fingerprints for each message cluster based on message length identifiers, operation identifiers, and counter identifiers. Subsequently, FG-MCRF constructs a message relationship graph based on these global message fingerprints and determines the final message type using the relationship graph. We also introduce the fine-grained multi-protocol dataset (FgMPD) to evaluate the clustering performance of our method. The experimental results demonstrate that the FG-MCRF methodology achieves superior clustering performance on the FgMPD dataset, outperforming other baseline methods. The clustering purity, Adjusted Rand Index (ARI), completeness, and accuracy of FG-MCRF in the fine-grained message clustering task are 0.9961, 0.9897, 0.9837, and 0.9899, respectively, representing improvements of 3.2%, 10.5%, 10.9% and 8.7% compared to state-of-the-art (SOTA) baseline methods. These results indicate that the FG-MCRF method possesses robust generalization capacity and extensibility, facilitating fine-grained message clustering.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"158 ","pages":"Article 104631"},"PeriodicalIF":5.4000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A fine-grained message clustering method based on message representation and identifier fingerprints\",\"authors\":\"Degang Li , Xi Chen, Mingliang Zhu, Qingjun Yuan, Chunxiang Gu\",\"doi\":\"10.1016/j.cose.2025.104631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Protocol reverse engineering is a critical technique for analyzing private protocols and unknown protocols. Message clustering is a foundational element of protocol reverse engineering, playing a key role in traffic classification and format inference. In this paper, we propose a fine-grained unknown message clustering method, termed FG-MCRF. FG-MCRF extracts deep representation vectors from the raw message data by constructing a representation network with low information loss and constructs high-purity message clusters based on representation vectors. The FG-MCRF method constructs high-precision global message fingerprints for each message cluster based on message length identifiers, operation identifiers, and counter identifiers. Subsequently, FG-MCRF constructs a message relationship graph based on these global message fingerprints and determines the final message type using the relationship graph. We also introduce the fine-grained multi-protocol dataset (FgMPD) to evaluate the clustering performance of our method. The experimental results demonstrate that the FG-MCRF methodology achieves superior clustering performance on the FgMPD dataset, outperforming other baseline methods. The clustering purity, Adjusted Rand Index (ARI), completeness, and accuracy of FG-MCRF in the fine-grained message clustering task are 0.9961, 0.9897, 0.9837, and 0.9899, respectively, representing improvements of 3.2%, 10.5%, 10.9% and 8.7% compared to state-of-the-art (SOTA) baseline methods. These results indicate that the FG-MCRF method possesses robust generalization capacity and extensibility, facilitating fine-grained message clustering.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"158 \",\"pages\":\"Article 104631\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404825003207\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825003207","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
A fine-grained message clustering method based on message representation and identifier fingerprints
Protocol reverse engineering is a critical technique for analyzing private protocols and unknown protocols. Message clustering is a foundational element of protocol reverse engineering, playing a key role in traffic classification and format inference. In this paper, we propose a fine-grained unknown message clustering method, termed FG-MCRF. FG-MCRF extracts deep representation vectors from the raw message data by constructing a representation network with low information loss and constructs high-purity message clusters based on representation vectors. The FG-MCRF method constructs high-precision global message fingerprints for each message cluster based on message length identifiers, operation identifiers, and counter identifiers. Subsequently, FG-MCRF constructs a message relationship graph based on these global message fingerprints and determines the final message type using the relationship graph. We also introduce the fine-grained multi-protocol dataset (FgMPD) to evaluate the clustering performance of our method. The experimental results demonstrate that the FG-MCRF methodology achieves superior clustering performance on the FgMPD dataset, outperforming other baseline methods. The clustering purity, Adjusted Rand Index (ARI), completeness, and accuracy of FG-MCRF in the fine-grained message clustering task are 0.9961, 0.9897, 0.9837, and 0.9899, respectively, representing improvements of 3.2%, 10.5%, 10.9% and 8.7% compared to state-of-the-art (SOTA) baseline methods. These results indicate that the FG-MCRF method possesses robust generalization capacity and extensibility, facilitating fine-grained message clustering.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.