Xiaoyi Ge , Xiongwei Zhang , Meng Sun , Yimin Wang , Li Li , Kunkun SongGong
{"title":"Cross-domain redundancy exploration by a deep encoder–decoder network for speech steganography","authors":"Xiaoyi Ge , Xiongwei Zhang , Meng Sun , Yimin Wang , Li Li , Kunkun SongGong","doi":"10.1016/j.jisa.2025.104150","DOIUrl":null,"url":null,"abstract":"<div><div>The technique of speech steganography involves embedding messages within openly transmitted speech channels without arousing suspicion. Nevertheless, current methods for embedding speech in speech suffer from weak imperceptibility and low message speech intelligibility. In this paper, we introduce a novel approach that explores cross-domain redundancy by leveraging a deep encoder–decoder neural network architecture to embed Mel-spectrograms into magnitude spectrograms. Specifically, the message is transformed into its Mel-spectrogram, while the cover is transformed into its magnitude spectrogram. Subsequently, the Mel-spectrogram is embedded as residuals in the magnitude spectrogram through an encoder known as the spectrogram super-resolution network (SSRN). Upon receiving the stego, a decoder network recoveres the Mel-spectrograms of the messages, and a high-fidelity HiFi-GAN vocoder then recovers the message waveform. The encoder–decoder network’s parameters are optimized to ensure imperceptibility and high quality. To validate the superiority of our proposed method, we compare it with recently proposed baselines using common databases such as the LJ Speech and VCTK datasets. Experimental results demonstrate that our method achieves SNRs of 33.83 dB and 30.28 dB for the cover signals on these two datasets, respectively. Furthermore, both the content and speaker identity of the recovered messages are well preserved, and the experiments also confirm the robustness against noises and the security of our approach.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"93 ","pages":"Article 104150"},"PeriodicalIF":3.8000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625001875","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The technique of speech steganography involves embedding messages within openly transmitted speech channels without arousing suspicion. Nevertheless, current methods for embedding speech in speech suffer from weak imperceptibility and low message speech intelligibility. In this paper, we introduce a novel approach that explores cross-domain redundancy by leveraging a deep encoder–decoder neural network architecture to embed Mel-spectrograms into magnitude spectrograms. Specifically, the message is transformed into its Mel-spectrogram, while the cover is transformed into its magnitude spectrogram. Subsequently, the Mel-spectrogram is embedded as residuals in the magnitude spectrogram through an encoder known as the spectrogram super-resolution network (SSRN). Upon receiving the stego, a decoder network recoveres the Mel-spectrograms of the messages, and a high-fidelity HiFi-GAN vocoder then recovers the message waveform. The encoder–decoder network’s parameters are optimized to ensure imperceptibility and high quality. To validate the superiority of our proposed method, we compare it with recently proposed baselines using common databases such as the LJ Speech and VCTK datasets. Experimental results demonstrate that our method achieves SNRs of 33.83 dB and 30.28 dB for the cover signals on these two datasets, respectively. Furthermore, both the content and speaker identity of the recovered messages are well preserved, and the experiments also confirm the robustness against noises and the security of our approach.
期刊介绍:
Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.