利用频率特征掩蔽增强增强噪声环境下的语音欺骗检测

IF 5.1 2区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

Engineering Science and Technology-An International Journal-Jestech Pub Date : 2025-02-05 DOI:10.1016/j.jestch.2025.101972

Soyul Han , Jaejin Seo , Sunmook Choi , Taein Kang , Sanghyeok Chung , Seungeun Lee , Seoyoung Park , Seungsang Oh , Il-Youp Kwak

{"title":"利用频率特征掩蔽增强增强噪声环境下的语音欺骗检测","authors":"Soyul Han , Jaejin Seo , Sunmook Choi , Taein Kang , Sanghyeok Chung , Seungeun Lee , Seoyoung Park , Seungsang Oh , Il-Youp Kwak","doi":"10.1016/j.jestch.2025.101972","DOIUrl":null,"url":null,"abstract":"<div><div>In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To address voice spoofing challenges, various competitions like ASVspoof 2015, 2017, 2019, 2021, and ADD 2022 have emerged. ADD 2022’s Track 1 aimed to classify genuine and fake speech signals in the presence of noise. Our exploratory data analysis revealed that for a given speech sample, noisy signals tend to occur within similar frequency bands. If a model is heavily reliant on data within frequency ranges that contains noise, its performance will be suboptimal. To address this issue, we propose a data augmentation technique called Frequency Feature Masking (FFM), which randomly masks frequency bands. FFM helps prevent overfitting and enhances the model’s robustness by avoiding reliance on specific frequency bands. Furthermore, we propose a frequency band masking method using a bell-shaped filter. This allows for smooth transitions between masked and unmasked frequencies, enabling the model to naturally mimic frequency variations in real speech signals. We compare the performance of various data augmentation methods with FFM in two spoofing detection datasets, ASVspoof 2019 LA and ADD 2022. The proposed FFM augmentation achieves state-of-the-art results in both datasets. The ADD 2022 dataset showed an improvement of approximately 51% after the application of FFM, while there was a 54% improvement in the ASVspoof 2019 LA dataset. In addition, we have made the code and demo used in the experiment publicly available.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"63 ","pages":"Article 101972"},"PeriodicalIF":5.1000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation\",\"authors\":\"Soyul Han , Jaejin Seo , Sunmook Choi , Taein Kang , Sanghyeok Chung , Seungeun Lee , Seoyoung Park , Seungsang Oh , Il-Youp Kwak\",\"doi\":\"10.1016/j.jestch.2025.101972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To address voice spoofing challenges, various competitions like ASVspoof 2015, 2017, 2019, 2021, and ADD 2022 have emerged. ADD 2022’s Track 1 aimed to classify genuine and fake speech signals in the presence of noise. Our exploratory data analysis revealed that for a given speech sample, noisy signals tend to occur within similar frequency bands. If a model is heavily reliant on data within frequency ranges that contains noise, its performance will be suboptimal. To address this issue, we propose a data augmentation technique called Frequency Feature Masking (FFM), which randomly masks frequency bands. FFM helps prevent overfitting and enhances the model’s robustness by avoiding reliance on specific frequency bands. Furthermore, we propose a frequency band masking method using a bell-shaped filter. This allows for smooth transitions between masked and unmasked frequencies, enabling the model to naturally mimic frequency variations in real speech signals. We compare the performance of various data augmentation methods with FFM in two spoofing detection datasets, ASVspoof 2019 LA and ADD 2022. The proposed FFM augmentation achieves state-of-the-art results in both datasets. The ADD 2022 dataset showed an improvement of approximately 51% after the application of FFM, while there was a 54% improvement in the ASVspoof 2019 LA dataset. In addition, we have made the code and demo used in the experiment publicly available.</div></div>\",\"PeriodicalId\":48609,\"journal\":{\"name\":\"Engineering Science and Technology-An International Journal-Jestech\",\"volume\":\"63 \",\"pages\":\"Article 101972\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2025-02-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Science and Technology-An International Journal-Jestech\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2215098625000278\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098625000278","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

在快速发展的语音相关技术领域，高科技公司正在开发多方面的语音助手，以满足他们特定的组织目标。然而，这种技术发展带来了更严重的安全漏洞，例如语音欺骗攻击。为了应对语音欺骗的挑战，已经出现了ASVspoof 2015、2017、2019、2021和ADD 2022等各种比赛。ADD 2022的轨道1旨在在存在噪声的情况下对真实和虚假语音信号进行分类。我们的探索性数据分析表明，对于给定的语音样本，噪声信号往往出现在相似的频带内。如果一个模型严重依赖于包含噪声的频率范围内的数据，那么它的性能将是次优的。为了解决这个问题，我们提出了一种称为频率特征掩蔽（FFM）的数据增强技术，它随机掩蔽频段。FFM有助于防止过拟合，并通过避免依赖特定频段来增强模型的鲁棒性。此外，我们提出了一种使用钟形滤波器的频带掩蔽方法。这允许在屏蔽和未屏蔽频率之间平滑过渡，使模型能够自然地模拟真实语音信号中的频率变化。我们比较了各种数据增强方法与FFM在两个欺骗检测数据集（ASVspoof 2019 LA和ADD 2022）中的性能。所提出的FFM增强在两个数据集中都达到了最先进的结果。应用FFM后，ADD 2022数据集的性能提高了约51%，而ASVspoof 2019 LA数据集的性能提高了54%。此外，我们已经公开了实验中使用的代码和演示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing voice spoofing detection in noisy environments using frequency feature masking augmentation

In the rapidly evolving landscape of voice-related technology, high-tech companies are developing multifaceted voice assistants, tailored to their specific organizational goals. This technological evolution, however, introduces heightened security vulnerabilities such as voice spoofing attacks. To address voice spoofing challenges, various competitions like ASVspoof 2015, 2017, 2019, 2021, and ADD 2022 have emerged. ADD 2022’s Track 1 aimed to classify genuine and fake speech signals in the presence of noise. Our exploratory data analysis revealed that for a given speech sample, noisy signals tend to occur within similar frequency bands. If a model is heavily reliant on data within frequency ranges that contains noise, its performance will be suboptimal. To address this issue, we propose a data augmentation technique called Frequency Feature Masking (FFM), which randomly masks frequency bands. FFM helps prevent overfitting and enhances the model’s robustness by avoiding reliance on specific frequency bands. Furthermore, we propose a frequency band masking method using a bell-shaped filter. This allows for smooth transitions between masked and unmasked frequencies, enabling the model to naturally mimic frequency variations in real speech signals. We compare the performance of various data augmentation methods with FFM in two spoofing detection datasets, ASVspoof 2019 LA and ADD 2022. The proposed FFM augmentation achieves state-of-the-art results in both datasets. The ADD 2022 dataset showed an improvement of approximately 51% after the application of FFM, while there was a 54% improvement in the ASVspoof 2019 LA dataset. In addition, we have made the code and demo used in the experiment publicly available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials

CiteScore

11.20

自引率

3.50%

发文量

153

审稿时长

22 days

期刊介绍： Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)