基于相邻帧集成和多码字优先注意的基于qim的VoIP隐写有效检测

IF 3.9 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-06-17 DOI:10.1109/LSP.2025.3580496

Cheng Zhang;Yue Yan;Shujuan Jiang;Zhong Chen

{"title":"基于相邻帧集成和多码字优先注意的基于qim的VoIP隐写有效检测","authors":"Cheng Zhang;Yue Yan;Shujuan Jiang;Zhong Chen","doi":"10.1109/LSP.2025.3580496","DOIUrl":null,"url":null,"abstract":"With the growing volume of VoIP traffic, many steganography algorithms exploit VoIP speech as a carrier, posing a threat to cybersecurity. Among them, quantization index modulation (QIM)-based VoIP steganography has demonstrated excellent stealth, making detection difficult. In recent years, more studies have focused on developing feasible QIM-based VoIP steganalysis methods for detecting QIM-based VoIP steganography. Previous studies have mostly focused on improving detection performance while neglecting efficiency, resulting in insufficient research on lightweight models. In online detection scenarios, detection efficiency is crucial. On the one hand, the long inference time of large models can delay warnings. On the other hand, the high computational requirements of these models make them difficult to deploy on remote devices, which reduces their practical value. In this letter, we propose a simple yet efficient model named EQVS (efficient QIM-based VoIP steganalysis network) for detecting QIM-based VoIP steganography. In EQVS, the fold and unfold operations are redesigned based on the characteristics of VoIP speech samples and the requirements of the QIM-based VoIP steganalysis task, to avoid disrupting correlation features. Then, multi-codeword priority attention mechanism, inspired by the multi-query attention and retention mechanisms, redefines the calculation procedure for the query, key, and value matrices, as well as the normalization and softmax operations, to further reduce computational resource consumption in a single attention head. Experimental results demonstrate that EQVS outperforms other state-of-the-art models in both detection performance and efficiency.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"2534-2538"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Detection of QIM-Based VoIP Steganography Using Adjacent Frame Integration and Multi-Codeword Priority Attention\",\"authors\":\"Cheng Zhang;Yue Yan;Shujuan Jiang;Zhong Chen\",\"doi\":\"10.1109/LSP.2025.3580496\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the growing volume of VoIP traffic, many steganography algorithms exploit VoIP speech as a carrier, posing a threat to cybersecurity. Among them, quantization index modulation (QIM)-based VoIP steganography has demonstrated excellent stealth, making detection difficult. In recent years, more studies have focused on developing feasible QIM-based VoIP steganalysis methods for detecting QIM-based VoIP steganography. Previous studies have mostly focused on improving detection performance while neglecting efficiency, resulting in insufficient research on lightweight models. In online detection scenarios, detection efficiency is crucial. On the one hand, the long inference time of large models can delay warnings. On the other hand, the high computational requirements of these models make them difficult to deploy on remote devices, which reduces their practical value. In this letter, we propose a simple yet efficient model named EQVS (efficient QIM-based VoIP steganalysis network) for detecting QIM-based VoIP steganography. In EQVS, the fold and unfold operations are redesigned based on the characteristics of VoIP speech samples and the requirements of the QIM-based VoIP steganalysis task, to avoid disrupting correlation features. Then, multi-codeword priority attention mechanism, inspired by the multi-query attention and retention mechanisms, redefines the calculation procedure for the query, key, and value matrices, as well as the normalization and softmax operations, to further reduce computational resource consumption in a single attention head. Experimental results demonstrate that EQVS outperforms other state-of-the-art models in both detection performance and efficiency.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"2534-2538\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11039055/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11039055/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

随着VoIP通信量的不断增长，许多隐写算法利用VoIP语音作为载体，对网络安全构成威胁。其中，基于量化指标调制（QIM）的VoIP隐写技术具有良好的隐蔽性，但检测难度较大。近年来，越来越多的研究集中在开发可行的基于qim的VoIP隐写分析方法来检测基于qim的VoIP隐写。以往的研究多侧重于提高检测性能而忽略了效率，导致对轻量化模型的研究不足。在在线检测场景下，检测效率至关重要。一方面，大型模型较长的推理时间会延迟预警。另一方面，这些模型的高计算需求使得它们难以在远程设备上部署，从而降低了它们的实用价值。在这封信中，我们提出了一个简单而有效的模型，名为EQVS（高效的基于qim的VoIP隐写分析网络），用于检测基于qim的VoIP隐写。在EQVS中，根据VoIP语音样本的特点和基于qim的VoIP隐写任务的要求，重新设计了折叠和展开操作，以避免干扰相关特征。然后，多码字优先关注机制在多查询关注和保留机制的启发下，重新定义了查询矩阵、键矩阵和值矩阵的计算过程，以及归一化和softmax操作，进一步降低了单个关注头的计算资源消耗。实验结果表明，EQVS在检测性能和效率上都优于其他先进的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Detection of QIM-Based VoIP Steganography Using Adjacent Frame Integration and Multi-Codeword Priority Attention

With the growing volume of VoIP traffic, many steganography algorithms exploit VoIP speech as a carrier, posing a threat to cybersecurity. Among them, quantization index modulation (QIM)-based VoIP steganography has demonstrated excellent stealth, making detection difficult. In recent years, more studies have focused on developing feasible QIM-based VoIP steganalysis methods for detecting QIM-based VoIP steganography. Previous studies have mostly focused on improving detection performance while neglecting efficiency, resulting in insufficient research on lightweight models. In online detection scenarios, detection efficiency is crucial. On the one hand, the long inference time of large models can delay warnings. On the other hand, the high computational requirements of these models make them difficult to deploy on remote devices, which reduces their practical value. In this letter, we propose a simple yet efficient model named EQVS (efficient QIM-based VoIP steganalysis network) for detecting QIM-based VoIP steganography. In EQVS, the fold and unfold operations are redesigned based on the characteristics of VoIP speech samples and the requirements of the QIM-based VoIP steganalysis task, to avoid disrupting correlation features. Then, multi-codeword priority attention mechanism, inspired by the multi-query attention and retention mechanisms, redefines the calculation procedure for the query, key, and value matrices, as well as the normalization and softmax operations, to further reduce computational resource consumption in a single attention head. Experimental results demonstrate that EQVS outperforms other state-of-the-art models in both detection performance and efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.