HRTF Clustering for Robust Training of a DNN for Sound Source Localization

IF 1.1 4区 工程技术 Q3 ACOUSTICS
Hugh O’Dwyer, F. Boland
{"title":"HRTF Clustering for Robust Training of a DNN for Sound Source Localization","authors":"Hugh O’Dwyer, F. Boland","doi":"10.17743/jaes.2022.0051","DOIUrl":null,"url":null,"abstract":"This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.","PeriodicalId":50008,"journal":{"name":"Journal of the Audio Engineering Society","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Audio Engineering Society","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.17743/jaes.2022.0051","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

This study shows how spherical sound source localization of binaural audio signals in the mismatchedhead-relatedtransferfunction(HRTF)conditioncanbeimprovedbyimplementing HRTF clustering when using machine learning. A new feature set of cross-correlation function, interaural level difference, and Gammatone cepstral coefficients is introduced and shown to outperform state-of-the-art methods in vertical localization in the mismatched HRTF condition by up to 5%. By examining the performance of Deep Neural Networks trained on single HRTF sets from the CIPIC database on other HRTFs, it is shown that HRTF sets can be clustered into groups of similar HRTFs. This results in the formulation of central HRTF sets representativeoftheirspecificcluster.BytrainingamachinelearningalgorithmonthesecentralHRTFs,itisshownthatamorerobustalgorithmcanbetrainedcapableofimprovingsound sourcelocalizationaccuracybyupto13%inthemismatchedHRTFcondition.Concurrently,localizationaccuracyisdecreasedbyapproximately6%inthematchedHRTFcondition,which accountsforlessthan9%ofalltestconditions.ResultsdemonstratethatHRTFclusteringcanvastlyimprovetherobustnessofbinauralsoundsourcelocalizationtounseenHRTFconditions.
用于声源定位的DNN鲁棒训练的HRTF聚类
本研究表明,在使用机器学习时,如何通过实施HRTF聚类来改善在不匹配的头部相关传递函数(HRTF)条件下双耳音频信号的球形声源定位。引入了一个新的互相关函数、耳间水平差和伽玛酮倒谱系数的特征集,并表明在不匹配的HRTF条件下,该特征集在垂直定位方面优于最先进的方法高达5%。通过检查在来自CIPIC数据库的单个HRTF集上训练的深度神经网络在其他HRTF上的性能,表明HRTF集可以聚类为相似的HRTF组。这导致了具有特定聚类代表性的中心HRTF集合的公式化。通过对这些中心HRTF的机器学习算法进行训练,可以得出结论,在匹配的HRTF条件下,可以训练出一种更完善的算法,能够将声源定位精度提高13%。同时,在预定的HRTF情况下,定位精度降低约6%,结果表明,HRTF聚类可以极大地提高声源定位在HRTF条件下的可信度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of the Audio Engineering Society
Journal of the Audio Engineering Society 工程技术-工程:综合
CiteScore
3.50
自引率
14.30%
发文量
53
审稿时长
1 months
期刊介绍: The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers. The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work; membership news, patents, new products, and newsworthy developments in the field of audio.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信