{"title":"基于加权城市街区闵科夫斯基距离的多域声学新型扬声器验证方法","authors":"Khushboo Jha, Sumit Srivastava, Aruna Jain","doi":"10.4218/etrij.2023-0485","DOIUrl":null,"url":null,"abstract":"<p>Access control is vital in interconnected environments like the Internet of Things, Industry 4.0, and smart connectivity, ensuring authorized access for security. Biometric-based access, particularly speaker verification (SV), enhances security with unique vocal features, offering nonintrusive authentication with continuous monitoring. Single-domain features prove insufficient in distinguishing similar traits, prompting latest SV advancements to adopt multidomain-based speech features. This paradigm addresses the limitations of single-domain features by amalgamating the merits of individual domains, establishing a cutting-edge approach. It utilizes cepstral–frequency–time domain feature fusion, achieved via cepstral mean-variance normalization for generalizability. The weighted city block Minkowski distance is proposed to compare reference and test speech templates. Parameters are computed based on the confusion matrix, template matching distance functions, dynamic acoustic conditions, and additive white Gaussian noise. A deep convolutional neural network classifier is assessed on open-source LibriSpeech and Speaker in the Wild corpora, surpassing the current methodologies.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"47 2","pages":"227-243"},"PeriodicalIF":1.3000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0485","citationCount":"0","resultStr":"{\"title\":\"A novel speaker verification approach featuring multidomain acoustics based on the weighted city block Minkowski distance\",\"authors\":\"Khushboo Jha, Sumit Srivastava, Aruna Jain\",\"doi\":\"10.4218/etrij.2023-0485\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Access control is vital in interconnected environments like the Internet of Things, Industry 4.0, and smart connectivity, ensuring authorized access for security. Biometric-based access, particularly speaker verification (SV), enhances security with unique vocal features, offering nonintrusive authentication with continuous monitoring. Single-domain features prove insufficient in distinguishing similar traits, prompting latest SV advancements to adopt multidomain-based speech features. This paradigm addresses the limitations of single-domain features by amalgamating the merits of individual domains, establishing a cutting-edge approach. It utilizes cepstral–frequency–time domain feature fusion, achieved via cepstral mean-variance normalization for generalizability. The weighted city block Minkowski distance is proposed to compare reference and test speech templates. Parameters are computed based on the confusion matrix, template matching distance functions, dynamic acoustic conditions, and additive white Gaussian noise. A deep convolutional neural network classifier is assessed on open-source LibriSpeech and Speaker in the Wild corpora, surpassing the current methodologies.</p>\",\"PeriodicalId\":11901,\"journal\":{\"name\":\"ETRI Journal\",\"volume\":\"47 2\",\"pages\":\"227-243\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0485\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ETRI Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.4218/etrij.2023-0485\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETRI Journal","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.4218/etrij.2023-0485","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
门禁控制在物联网、工业 4.0 和智能连接等互联环境中至关重要,可确保授权访问的安全性。基于生物特征的访问,特别是扬声器验证(SV),通过独特的声音特征增强了安全性,提供了非侵入式验证和持续监控。事实证明,单一领域的特征不足以区分相似的特征,这促使最新的 SV 进展采用了基于多领域的语音特征。这一范例通过综合各个域的优点,解决了单域特征的局限性,建立了一种先进的方法。它利用epstral-频率-时间域特征融合,通过epstral均值-方差归一化实现通用性。提出了加权城市街区闵科夫斯基距离来比较参考和测试语音模板。参数的计算基于混淆矩阵、模板匹配距离函数、动态声学条件和加性白高斯噪声。在开源 LibriSpeech 和 Speaker in the Wild 语料库上对深度卷积神经网络分类器进行了评估,结果超过了当前的方法。
A novel speaker verification approach featuring multidomain acoustics based on the weighted city block Minkowski distance
Access control is vital in interconnected environments like the Internet of Things, Industry 4.0, and smart connectivity, ensuring authorized access for security. Biometric-based access, particularly speaker verification (SV), enhances security with unique vocal features, offering nonintrusive authentication with continuous monitoring. Single-domain features prove insufficient in distinguishing similar traits, prompting latest SV advancements to adopt multidomain-based speech features. This paradigm addresses the limitations of single-domain features by amalgamating the merits of individual domains, establishing a cutting-edge approach. It utilizes cepstral–frequency–time domain feature fusion, achieved via cepstral mean-variance normalization for generalizability. The weighted city block Minkowski distance is proposed to compare reference and test speech templates. Parameters are computed based on the confusion matrix, template matching distance functions, dynamic acoustic conditions, and additive white Gaussian noise. A deep convolutional neural network classifier is assessed on open-source LibriSpeech and Speaker in the Wild corpora, surpassing the current methodologies.
期刊介绍:
ETRI Journal is an international, peer-reviewed multidisciplinary journal published bimonthly in English. The main focus of the journal is to provide an open forum to exchange innovative ideas and technology in the fields of information, telecommunications, and electronics.
Key topics of interest include high-performance computing, big data analytics, cloud computing, multimedia technology, communication networks and services, wireless communications and mobile computing, material and component technology, as well as security.
With an international editorial committee and experts from around the world as reviewers, ETRI Journal publishes high-quality research papers on the latest and best developments from the global community.