Efficient Streaming Voice Steganalysis in Challenging Detection Scenarios

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-06-10 DOI:10.1109/TIFS.2025.3577042

Pengcheng Zhou;Zhengyang Fang;Zhongliang Yang;Zhili Zhou;Linna Zhou

{"title":"Efficient Streaming Voice Steganalysis in Challenging Detection Scenarios","authors":"Pengcheng Zhou;Zhengyang Fang;Zhongliang Yang;Zhili Zhou;Linna Zhou","doi":"10.1109/TIFS.2025.3577042","DOIUrl":null,"url":null,"abstract":"In recent years, there has been an increasing number of information hiding techniques based on network streaming media, focusing on how to covertly and efficiently embed secret information into real-time transmitted network media signals to achieve concealed communication. The misuse of these techniques can lead to significant security risks, such as the spread of malicious code, commands, and viruses. Current steganalysis methods for network voice streams face two major challenges: efficient detection under low embedding rates and short duration conditions. These challenges arise because, with low embedding rates (e.g., as low as 10%) and short transmission durations (e.g., only 0.1s), detection models struggle to acquire sufficiently rich sample features, making effective steganalysis difficult. To address these challenges, this paper introduces a Dual-View VoIP Steganalysis Framework (<bold>DVSF</b>). The framework first randomly obfuscates parts of the native steganographic descriptors in VoIP stream segments, making the steganographic features of hard-to-detect samples more pronounced and easier to learn. It then captures fine-grained local features related to steganography, building on the global features of VoIP. Specially constructed VoIP segment triplets further adjust the feature distances within the model. Ultimately, this method effectively address the detection difficulty in VoIP. Extensive experiments demonstrate that our method significantly improves the accuracy of streaming voice steganalysis in these challenging detection scenarios, surpassing existing state-of-the-art methods and offering superior near-real-time performance.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"5966-5977"},"PeriodicalIF":8.0000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11030273/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, there has been an increasing number of information hiding techniques based on network streaming media, focusing on how to covertly and efficiently embed secret information into real-time transmitted network media signals to achieve concealed communication. The misuse of these techniques can lead to significant security risks, such as the spread of malicious code, commands, and viruses. Current steganalysis methods for network voice streams face two major challenges: efficient detection under low embedding rates and short duration conditions. These challenges arise because, with low embedding rates (e.g., as low as 10%) and short transmission durations (e.g., only 0.1s), detection models struggle to acquire sufficiently rich sample features, making effective steganalysis difficult. To address these challenges, this paper introduces a Dual-View VoIP Steganalysis Framework (DVSF). The framework first randomly obfuscates parts of the native steganographic descriptors in VoIP stream segments, making the steganographic features of hard-to-detect samples more pronounced and easier to learn. It then captures fine-grained local features related to steganography, building on the global features of VoIP. Specially constructed VoIP segment triplets further adjust the feature distances within the model. Ultimately, this method effectively address the detection difficulty in VoIP. Extensive experiments demonstrate that our method significantly improves the accuracy of streaming voice steganalysis in these challenging detection scenarios, surpassing existing state-of-the-art methods and offering superior near-real-time performance.

查看原文本刊更多论文

高效流语音隐写分析在具有挑战性的检测场景

近年来，基于网络流媒体的信息隐藏技术越来越多，其重点是如何隐蔽高效地将秘密信息嵌入到实时传输的网络媒体信号中，实现隐蔽通信。这些技术的误用可能导致严重的安全风险，例如恶意代码、命令和病毒的传播。目前针对网络语音流的隐写分析方法面临着低嵌入率和短持续时间条件下的高效检测两大挑战。这些挑战的出现是因为低嵌入率（例如低至10%）和短传输持续时间（例如仅0.1s），检测模型难以获得足够丰富的样本特征，从而使有效的隐写分析变得困难。为了解决这些问题，本文介绍了一种双视图VoIP隐写分析框架（DVSF）。该框架首先在VoIP流段中随机混淆部分本地隐写描述符，使难以检测的样本的隐写特征更加明显，更容易学习。然后，它捕获与隐写术相关的细粒度本地特征，建立在VoIP的全局特征之上。特别构建的VoIP段三元组进一步调整了模型内的特征距离。最终，该方法有效地解决了VoIP中的检测难题。大量的实验表明，我们的方法在这些具有挑战性的检测场景中显著提高了流语音隐写分析的准确性，超越了现有的最先进的方法，并提供了卓越的近实时性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features