使用自适应可学习波束形成器的多通道重放语音检测

IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing Pub Date : 2025-03-09 DOI:10.1109/OJSP.2025.3568758

Michael Neri;Tuomas Virtanen

{"title":"使用自适应可学习波束形成器的多通道重放语音检测","authors":"Michael Neri;Tuomas Virtanen","doi":"10.1109/OJSP.2025.3568758","DOIUrl":null,"url":null,"abstract":"Replay attacks belong to the class of severe threats against voice-controlled systems, exploiting the easy accessibility of speech signals by recorded and replayed speech to grant unauthorized access to sensitive data. In this work, we propose a multi-channel neural network architecture called M-ALRAD for the detection of replay attacks based on spatial audio features. This approach integrates a learnable adaptive beamformer with a convolutional recurrent neural network, allowing for joint optimization of spatial filtering and classification. Experiments have been carried out on the ReMASC dataset, which is a state-of-the-art multi-channel replay speech detection dataset encompassing four microphones with diverse array configurations and four environments. Results on the ReMASC dataset show the superiority of the approach compared to the state-of-the-art and yield substantial improvements for challenging acoustic environments. In addition, we demonstrate that our approach is able to better generalize to unseen environments with respect to prior studies.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"530-535"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10994395","citationCount":"0","resultStr":"{\"title\":\"Multi-Channel Replay Speech Detection Using an Adaptive Learnable Beamformer\",\"authors\":\"Michael Neri;Tuomas Virtanen\",\"doi\":\"10.1109/OJSP.2025.3568758\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Replay attacks belong to the class of severe threats against voice-controlled systems, exploiting the easy accessibility of speech signals by recorded and replayed speech to grant unauthorized access to sensitive data. In this work, we propose a multi-channel neural network architecture called M-ALRAD for the detection of replay attacks based on spatial audio features. This approach integrates a learnable adaptive beamformer with a convolutional recurrent neural network, allowing for joint optimization of spatial filtering and classification. Experiments have been carried out on the ReMASC dataset, which is a state-of-the-art multi-channel replay speech detection dataset encompassing four microphones with diverse array configurations and four environments. Results on the ReMASC dataset show the superiority of the approach compared to the state-of-the-art and yield substantial improvements for challenging acoustic environments. In addition, we demonstrate that our approach is able to better generalize to unseen environments with respect to prior studies.\",\"PeriodicalId\":73300,\"journal\":{\"name\":\"IEEE open journal of signal processing\",\"volume\":\"6 \",\"pages\":\"530-535\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10994395\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of signal processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10994395/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10994395/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

重播攻击属于对语音控制系统的严重威胁，利用录音和重播语音容易访问语音信号来授予对敏感数据的未经授权访问。在这项工作中，我们提出了一种称为M-ALRAD的多通道神经网络架构，用于检测基于空间音频特征的重播攻击。该方法将可学习的自适应波束形成器与卷积递归神经网络集成在一起，允许空间滤波和分类的联合优化。在ReMASC数据集上进行了实验，ReMASC数据集是一个最先进的多通道重播语音检测数据集，包含四个具有不同阵列配置和四种环境的麦克风。ReMASC数据集的结果表明，与最先进的方法相比，该方法具有优势，并且在具有挑战性的声学环境中产生了实质性的改进。此外，我们证明，与之前的研究相比，我们的方法能够更好地推广到看不见的环境。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-Channel Replay Speech Detection Using an Adaptive Learnable Beamformer

Replay attacks belong to the class of severe threats against voice-controlled systems, exploiting the easy accessibility of speech signals by recorded and replayed speech to grant unauthorized access to sensitive data. In this work, we propose a multi-channel neural network architecture called M-ALRAD for the detection of replay attacks based on spatial audio features. This approach integrates a learnable adaptive beamformer with a convolutional recurrent neural network, allowing for joint optimization of spatial filtering and classification. Experiments have been carried out on the ReMASC dataset, which is a state-of-the-art multi-channel replay speech detection dataset encompassing four microphones with diverse array configurations and four environments. Results on the ReMASC dataset show the superiority of the approach compared to the state-of-the-art and yield substantial improvements for challenging acoustic environments. In addition, we demonstrate that our approach is able to better generalize to unseen environments with respect to prior studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE open journal of signal processing

CiteScore

5.30

自引率

0.00%

发文量

审稿时长

22 weeks