Hierarchical graph neural network for compressed speech steganalysis

IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS

Array Pub Date : 2025-09-25 DOI:10.1016/j.array.2025.100510

Mustapha Hemis , Hamza Kheddar , Mohamed Chahine Ghanem , Bachir Boudraa

{"title":"Hierarchical graph neural network for compressed speech steganalysis","authors":"Mustapha Hemis , Hamza Kheddar , Mohamed Chahine Ghanem , Bachir Boudraa","doi":"10.1016/j.array.2025.100510","DOIUrl":null,"url":null,"abstract":"<div><div>Steganalysis methods based on deep learning (DL) often struggle with computational complexity and challenges in generalizing across different datasets. In the specific case of voice-over-IP (VoIP) speech streams, detection is particularly challenging because the low bit-rate encoding creates complex, relational dependencies between speech frames. Conventional DL models, which treat data as simple sequences or grids, often fail to capture these complex inter-frame dependencies effectively. To address this gap, this paper presents the first application of a graph neural network (GNN), specifically the GraphSAGE architecture, for steganalysis of compressed VoIP speech streams. The method involves straightforward graph construction from VoIP streams and employs GraphSAGE to capture hierarchical steganalysis information, including both fine-grained details and high-level patterns, thereby achieving high detection accuracy. Experimental results demonstrate that the developed approach performs well in uncovering quantization index modulation (QIM)-based steganographic patterns in VoIP signals. It achieves detection accuracy exceeding 98% even for short 0.5-second samples, and 95.17% accuracy under challenging conditions with low embedding rates, representing an improvement of 2.8% over the best-performing state-of-the-art methods. Furthermore, the model exhibits superior efficiency, with an average detection time as low as 0.016 s for 0.5-second samples—an improvement of 0.003 s. This makes it efficient for online steganalysis tasks, providing a superior balance between detection accuracy and efficiency under the constraint of short samples with low embedding rates.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"28 ","pages":"Article 100510"},"PeriodicalIF":4.5000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625001377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Steganalysis methods based on deep learning (DL) often struggle with computational complexity and challenges in generalizing across different datasets. In the specific case of voice-over-IP (VoIP) speech streams, detection is particularly challenging because the low bit-rate encoding creates complex, relational dependencies between speech frames. Conventional DL models, which treat data as simple sequences or grids, often fail to capture these complex inter-frame dependencies effectively. To address this gap, this paper presents the first application of a graph neural network (GNN), specifically the GraphSAGE architecture, for steganalysis of compressed VoIP speech streams. The method involves straightforward graph construction from VoIP streams and employs GraphSAGE to capture hierarchical steganalysis information, including both fine-grained details and high-level patterns, thereby achieving high detection accuracy. Experimental results demonstrate that the developed approach performs well in uncovering quantization index modulation (QIM)-based steganographic patterns in VoIP signals. It achieves detection accuracy exceeding 98% even for short 0.5-second samples, and 95.17% accuracy under challenging conditions with low embedding rates, representing an improvement of 2.8% over the best-performing state-of-the-art methods. Furthermore, the model exhibits superior efficiency, with an average detection time as low as 0.016 s for 0.5-second samples—an improvement of 0.003 s. This makes it efficient for online steganalysis tasks, providing a superior balance between detection accuracy and efficiency under the constraint of short samples with low embedding rates.

查看原文本刊更多论文

用于压缩语音隐写分析的层次图神经网络

基于深度学习（DL）的隐写分析方法经常面临计算复杂性和跨不同数据集泛化的挑战。在ip语音（VoIP）语音流的特定情况下，检测尤其具有挑战性，因为低比特率编码在语音帧之间创建了复杂的关系依赖关系。传统的深度学习模型将数据视为简单的序列或网格，通常无法有效地捕获这些复杂的帧间依赖关系。为了解决这一差距，本文提出了图神经网络（GNN）的第一个应用，特别是GraphSAGE架构，用于压缩VoIP语音流的隐写分析。该方法从VoIP流中直接构建图形，并使用GraphSAGE捕获分层隐写分析信息，包括细粒度细节和高级模式，从而达到较高的检测精度。实验结果表明，该方法能较好地揭示VoIP信号中基于量化指标调制（QIM）的隐写模式。即使在短的0.5秒样本中，它的检测准确率也超过98%，在具有挑战性的低嵌入率条件下，它的检测准确率达到95.17%，比最先进的方法提高了2.8%。此外，该模型表现出了优异的效率，对于0.5秒的样品，平均检测时间低至0.016 s，提高了0.003 s。这使得它在在线隐写分析任务中高效，在低嵌入率的短样本约束下，在检测精度和效率之间提供了卓越的平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊