{"title":"用于声源定位的DNN鲁棒训练的HRTF聚类","authors":"Hugh O’Dwyer, F. Boland","doi":"10.17743/jaes.2022.0051","DOIUrl":null,"url":null,"abstract":"This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HRTF Clustering for Robust Training of a DNN for Sound Source Localization\",\"authors\":\"Hugh O’Dwyer, F. Boland\",\"doi\":\"10.17743/jaes.2022.0051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.\",\"PeriodicalId\":50008,\"journal\":{\"name\":\"Journal of the Audio Engineering Society\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2022-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Audio Engineering Society\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.17743/jaes.2022.0051\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Audio Engineering Society","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.17743/jaes.2022.0051","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}
HRTF Clustering for Robust Training of a DNN for Sound Source Localization
This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.
期刊介绍:
The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.
The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work; membership news, patents, new products, and newsworthy developments in the field of audio.