{"title":"Delta特征映射与应用欺骗语音检测","authors":"Gökay Dişken","doi":"10.1016/j.compeleceng.2025.110748","DOIUrl":null,"url":null,"abstract":"<div><div>Convolutional layers have been used in many deep learning architectures due to their feature extraction capabilities. Besides traditional convolution, several modified convolution techniques have been proposed. Among them, differential convolution generates additional feature maps by considering the differences on activation maps in a selected direction. It was found to be effective for image recognition with pre-defined fixed filters focusing on two adjacent activations. For speech-related tasks, tracking dynamic information on a broader range may be beneficial. With this intention, this paper proposes delta feature maps, where the fixed filters of differential convolution are modified based on the computation of handcrafted delta cepstral features. The proposed filters can extract dynamic information, similar to the delta cepstral features, within a convolutional neural network scheme. Handcrafted Delta and/or delta-delta features are proven to be effective especially for synthetic speech detection. Hence, logical access (LA) condition of ASVspoof 2019 and the recent ASVspoof 5 datasets are used to verify the effectiveness of the delta feature maps. For ASVspoof 2019 dataset, residual time-domain synthetic speech detection net (Res-TSSDNet) is used as a 1-D model and one-class neural network with directed statistics pooling (OCNet-DSP) is used as a 2-D model, verifying that delta feature maps can work with both dimensions. As ASVspoof 5 is a more challenging dataset, data augmentation, a foundation model front-end, and Nes2Net-X back-end are used. Delta feature maps are utilized within Nes2Net-X via two different configurations. One of these configurations dramatically reduced the back-end size from 291 K to 76 K while preserving the performance. The other configuration achieved the lowest equal error rate, 4.33 %, among the reported single systems with a pre-trained foundation model.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"128 ","pages":"Article 110748"},"PeriodicalIF":4.9000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Delta feature maps with application to spoofed speech detection\",\"authors\":\"Gökay Dişken\",\"doi\":\"10.1016/j.compeleceng.2025.110748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Convolutional layers have been used in many deep learning architectures due to their feature extraction capabilities. Besides traditional convolution, several modified convolution techniques have been proposed. Among them, differential convolution generates additional feature maps by considering the differences on activation maps in a selected direction. It was found to be effective for image recognition with pre-defined fixed filters focusing on two adjacent activations. For speech-related tasks, tracking dynamic information on a broader range may be beneficial. With this intention, this paper proposes delta feature maps, where the fixed filters of differential convolution are modified based on the computation of handcrafted delta cepstral features. The proposed filters can extract dynamic information, similar to the delta cepstral features, within a convolutional neural network scheme. Handcrafted Delta and/or delta-delta features are proven to be effective especially for synthetic speech detection. Hence, logical access (LA) condition of ASVspoof 2019 and the recent ASVspoof 5 datasets are used to verify the effectiveness of the delta feature maps. For ASVspoof 2019 dataset, residual time-domain synthetic speech detection net (Res-TSSDNet) is used as a 1-D model and one-class neural network with directed statistics pooling (OCNet-DSP) is used as a 2-D model, verifying that delta feature maps can work with both dimensions. As ASVspoof 5 is a more challenging dataset, data augmentation, a foundation model front-end, and Nes2Net-X back-end are used. Delta feature maps are utilized within Nes2Net-X via two different configurations. One of these configurations dramatically reduced the back-end size from 291 K to 76 K while preserving the performance. The other configuration achieved the lowest equal error rate, 4.33 %, among the reported single systems with a pre-trained foundation model.</div></div>\",\"PeriodicalId\":50630,\"journal\":{\"name\":\"Computers & Electrical Engineering\",\"volume\":\"128 \",\"pages\":\"Article 110748\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Electrical Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045790625006913\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625006913","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Delta feature maps with application to spoofed speech detection
Convolutional layers have been used in many deep learning architectures due to their feature extraction capabilities. Besides traditional convolution, several modified convolution techniques have been proposed. Among them, differential convolution generates additional feature maps by considering the differences on activation maps in a selected direction. It was found to be effective for image recognition with pre-defined fixed filters focusing on two adjacent activations. For speech-related tasks, tracking dynamic information on a broader range may be beneficial. With this intention, this paper proposes delta feature maps, where the fixed filters of differential convolution are modified based on the computation of handcrafted delta cepstral features. The proposed filters can extract dynamic information, similar to the delta cepstral features, within a convolutional neural network scheme. Handcrafted Delta and/or delta-delta features are proven to be effective especially for synthetic speech detection. Hence, logical access (LA) condition of ASVspoof 2019 and the recent ASVspoof 5 datasets are used to verify the effectiveness of the delta feature maps. For ASVspoof 2019 dataset, residual time-domain synthetic speech detection net (Res-TSSDNet) is used as a 1-D model and one-class neural network with directed statistics pooling (OCNet-DSP) is used as a 2-D model, verifying that delta feature maps can work with both dimensions. As ASVspoof 5 is a more challenging dataset, data augmentation, a foundation model front-end, and Nes2Net-X back-end are used. Delta feature maps are utilized within Nes2Net-X via two different configurations. One of these configurations dramatically reduced the back-end size from 291 K to 76 K while preserving the performance. The other configuration achieved the lowest equal error rate, 4.33 %, among the reported single systems with a pre-trained foundation model.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.