{"title":"Hierarchical graph neural network for compressed speech steganalysis","authors":"Mustapha Hemis , Hamza Kheddar , Mohamed Chahine Ghanem , Bachir Boudraa","doi":"10.1016/j.array.2025.100510","DOIUrl":null,"url":null,"abstract":"<div><div>Steganalysis methods based on deep learning (DL) often struggle with computational complexity and challenges in generalizing across different datasets. In the specific case of voice-over-IP (VoIP) speech streams, detection is particularly challenging because the low bit-rate encoding creates complex, relational dependencies between speech frames. Conventional DL models, which treat data as simple sequences or grids, often fail to capture these complex inter-frame dependencies effectively. To address this gap, this paper presents the first application of a graph neural network (GNN), specifically the GraphSAGE architecture, for steganalysis of compressed VoIP speech streams. The method involves straightforward graph construction from VoIP streams and employs GraphSAGE to capture hierarchical steganalysis information, including both fine-grained details and high-level patterns, thereby achieving high detection accuracy. Experimental results demonstrate that the developed approach performs well in uncovering quantization index modulation (QIM)-based steganographic patterns in VoIP signals. It achieves detection accuracy exceeding 98% even for short 0.5-second samples, and 95.17% accuracy under challenging conditions with low embedding rates, representing an improvement of 2.8% over the best-performing state-of-the-art methods. Furthermore, the model exhibits superior efficiency, with an average detection time as low as 0.016 s for 0.5-second samples—an improvement of 0.003 s. This makes it efficient for online steganalysis tasks, providing a superior balance between detection accuracy and efficiency under the constraint of short samples with low embedding rates.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100510"},"PeriodicalIF":4.5000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625001377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Steganalysis methods based on deep learning (DL) often struggle with computational complexity and challenges in generalizing across different datasets. In the specific case of voice-over-IP (VoIP) speech streams, detection is particularly challenging because the low bit-rate encoding creates complex, relational dependencies between speech frames. Conventional DL models, which treat data as simple sequences or grids, often fail to capture these complex inter-frame dependencies effectively. To address this gap, this paper presents the first application of a graph neural network (GNN), specifically the GraphSAGE architecture, for steganalysis of compressed VoIP speech streams. The method involves straightforward graph construction from VoIP streams and employs GraphSAGE to capture hierarchical steganalysis information, including both fine-grained details and high-level patterns, thereby achieving high detection accuracy. Experimental results demonstrate that the developed approach performs well in uncovering quantization index modulation (QIM)-based steganographic patterns in VoIP signals. It achieves detection accuracy exceeding 98% even for short 0.5-second samples, and 95.17% accuracy under challenging conditions with low embedding rates, representing an improvement of 2.8% over the best-performing state-of-the-art methods. Furthermore, the model exhibits superior efficiency, with an average detection time as low as 0.016 s for 0.5-second samples—an improvement of 0.003 s. This makes it efficient for online steganalysis tasks, providing a superior balance between detection accuracy and efficiency under the constraint of short samples with low embedding rates.