{"title":"A novel speaker verification approach featuring multidomain acoustics based on the weighted city block Minkowski distance","authors":"Khushboo Jha, Sumit Srivastava, Aruna Jain","doi":"10.4218/etrij.2023-0485","DOIUrl":null,"url":null,"abstract":"<p>Access control is vital in interconnected environments like the Internet of Things, Industry 4.0, and smart connectivity, ensuring authorized access for security. Biometric-based access, particularly speaker verification (SV), enhances security with unique vocal features, offering nonintrusive authentication with continuous monitoring. Single-domain features prove insufficient in distinguishing similar traits, prompting latest SV advancements to adopt multidomain-based speech features. This paradigm addresses the limitations of single-domain features by amalgamating the merits of individual domains, establishing a cutting-edge approach. It utilizes cepstral–frequency–time domain feature fusion, achieved via cepstral mean-variance normalization for generalizability. The weighted city block Minkowski distance is proposed to compare reference and test speech templates. Parameters are computed based on the confusion matrix, template matching distance functions, dynamic acoustic conditions, and additive white Gaussian noise. A deep convolutional neural network classifier is assessed on open-source LibriSpeech and Speaker in the Wild corpora, surpassing the current methodologies.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"47 2","pages":"227-243"},"PeriodicalIF":1.3000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0485","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETRI Journal","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.4218/etrij.2023-0485","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Access control is vital in interconnected environments like the Internet of Things, Industry 4.0, and smart connectivity, ensuring authorized access for security. Biometric-based access, particularly speaker verification (SV), enhances security with unique vocal features, offering nonintrusive authentication with continuous monitoring. Single-domain features prove insufficient in distinguishing similar traits, prompting latest SV advancements to adopt multidomain-based speech features. This paradigm addresses the limitations of single-domain features by amalgamating the merits of individual domains, establishing a cutting-edge approach. It utilizes cepstral–frequency–time domain feature fusion, achieved via cepstral mean-variance normalization for generalizability. The weighted city block Minkowski distance is proposed to compare reference and test speech templates. Parameters are computed based on the confusion matrix, template matching distance functions, dynamic acoustic conditions, and additive white Gaussian noise. A deep convolutional neural network classifier is assessed on open-source LibriSpeech and Speaker in the Wild corpora, surpassing the current methodologies.
期刊介绍:
ETRI Journal is an international, peer-reviewed multidisciplinary journal published bimonthly in English. The main focus of the journal is to provide an open forum to exchange innovative ideas and technology in the fields of information, telecommunications, and electronics.
Key topics of interest include high-performance computing, big data analytics, cloud computing, multimedia technology, communication networks and services, wireless communications and mobile computing, material and component technology, as well as security.
With an international editorial committee and experts from around the world as reviewers, ETRI Journal publishes high-quality research papers on the latest and best developments from the global community.