{"title":"反重放:一个快速和轻量级的语音重放攻击检测系统","authors":"Zhuoyang Shi, Chaohao Li, Zizhi Jin, Weinong Sun, Xiaoyu Ji, Wenyuan Xu","doi":"10.1109/ICPADS53394.2021.00027","DOIUrl":null,"url":null,"abstract":"Due to the open nature of voice and voice interface, attackers can easily record the user's voice commands and spoof the voice recognition systems by replaying them. Existing voice replay attack detection methods mainly rely on extra hardware to determine the sound source or require excessively computing resources for training the classifier with a large number of acoustic features. Hence, we propose Anti-Replay, a fast and lightweight detection system for voice replay attacks. To overcome the challenge of redundant classification feature vectors and complex calculation, we first investigate the spectrum difference between live-human voice and the replayed audio caused by the non-linear distortion of the attacker's microphones and speakers and then extract 72-dimensional feature vectors. Then we employ a single deep convolutional neural network classifier (SE-ResNet50) to enhance the robustness of our classification model. Finally, we evaluate the performance of Anti-Replay on the datasets of ASVspoof2017 and ASVspoof2019. Results show that Anti-Replay can achieve an equal error rate (EER) of 2.38% and 0.82% on two datasets, respectively. Meanwhile, the training time and the model size of Anti-Replay have decreased by 56% and 84% compared with the baseline model (i.e., CQCC-GMM).","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"252 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Anti-Replay: A Fast and Lightweight Voice Replay Attack Detection System\",\"authors\":\"Zhuoyang Shi, Chaohao Li, Zizhi Jin, Weinong Sun, Xiaoyu Ji, Wenyuan Xu\",\"doi\":\"10.1109/ICPADS53394.2021.00027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the open nature of voice and voice interface, attackers can easily record the user's voice commands and spoof the voice recognition systems by replaying them. Existing voice replay attack detection methods mainly rely on extra hardware to determine the sound source or require excessively computing resources for training the classifier with a large number of acoustic features. Hence, we propose Anti-Replay, a fast and lightweight detection system for voice replay attacks. To overcome the challenge of redundant classification feature vectors and complex calculation, we first investigate the spectrum difference between live-human voice and the replayed audio caused by the non-linear distortion of the attacker's microphones and speakers and then extract 72-dimensional feature vectors. Then we employ a single deep convolutional neural network classifier (SE-ResNet50) to enhance the robustness of our classification model. Finally, we evaluate the performance of Anti-Replay on the datasets of ASVspoof2017 and ASVspoof2019. Results show that Anti-Replay can achieve an equal error rate (EER) of 2.38% and 0.82% on two datasets, respectively. Meanwhile, the training time and the model size of Anti-Replay have decreased by 56% and 84% compared with the baseline model (i.e., CQCC-GMM).\",\"PeriodicalId\":309508,\"journal\":{\"name\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"252 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS53394.2021.00027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS53394.2021.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Anti-Replay: A Fast and Lightweight Voice Replay Attack Detection System
Due to the open nature of voice and voice interface, attackers can easily record the user's voice commands and spoof the voice recognition systems by replaying them. Existing voice replay attack detection methods mainly rely on extra hardware to determine the sound source or require excessively computing resources for training the classifier with a large number of acoustic features. Hence, we propose Anti-Replay, a fast and lightweight detection system for voice replay attacks. To overcome the challenge of redundant classification feature vectors and complex calculation, we first investigate the spectrum difference between live-human voice and the replayed audio caused by the non-linear distortion of the attacker's microphones and speakers and then extract 72-dimensional feature vectors. Then we employ a single deep convolutional neural network classifier (SE-ResNet50) to enhance the robustness of our classification model. Finally, we evaluate the performance of Anti-Replay on the datasets of ASVspoof2017 and ASVspoof2019. Results show that Anti-Replay can achieve an equal error rate (EER) of 2.38% and 0.82% on two datasets, respectively. Meanwhile, the training time and the model size of Anti-Replay have decreased by 56% and 84% compared with the baseline model (i.e., CQCC-GMM).