Spatial up-sampling of HRTF sets using generative adversarial networks: A pilot study

IF 1.3 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
Pongsakorn Siripornpitak, Isaac Engel, Isaac Squires, Samuel J. Cooper, L. Picinali
{"title":"Spatial up-sampling of HRTF sets using generative adversarial networks: A pilot study","authors":"Pongsakorn Siripornpitak, Isaac Engel, Isaac Squires, Samuel J. Cooper, L. Picinali","doi":"10.3389/frsip.2022.904398","DOIUrl":null,"url":null,"abstract":"Headphones-based spatial audio simulations rely on Head-related Transfer Functions (HRTFs) in order to reconstruct the sound field at the entrance of the listener’s ears. A HRTF is strongly dependent on the listener’s specific anatomical structures, and it has been shown that virtual sounds recreated with someone else’s HRTF result in worse localisation accuracy, as well as altering other subjective measures such as externalisation and realism. Acoustic measurements of the filtering effects generated by ears, head and torso has proven to be one of the most reliable ways to obtain a personalised HRTF. However this requires a dedicated and expensive setup, and is time-intensive. In order to simplify the measurement setup, thereby improving the scalability of the process, we are exploring strategies to reduce the number of acoustic measurements without degrading the spatial resolution of the HRTF. Traditionally, spatial up-sampling of HRTF sets is achieved through barycentric interpolation or by employing the spherical harmonics framework. However, such methods often perform poorly when the provided HRTF data is spatially very sparse. This work investigates the use of generative adversarial networks (GANs) to tackle the up-sampling problem, offering an initial insight about the suitability of this technique. Numerical evaluations based on spectral magnitude error and perceptual model outputs are presented on single spatial dimensions, therefore considering sources positioned only in one of the three main planes: Horizontal, median, and frontal. Results suggest that traditional HRTF interpolation methods perform better than the proposed GAN-based one when the distance between measurements is smaller than 90°, but for the sparsest conditions (i.e., one measurement every 120°–180°), the proposed approach outperforms the others.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"76 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in signal processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frsip.2022.904398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 4

Abstract

Headphones-based spatial audio simulations rely on Head-related Transfer Functions (HRTFs) in order to reconstruct the sound field at the entrance of the listener’s ears. A HRTF is strongly dependent on the listener’s specific anatomical structures, and it has been shown that virtual sounds recreated with someone else’s HRTF result in worse localisation accuracy, as well as altering other subjective measures such as externalisation and realism. Acoustic measurements of the filtering effects generated by ears, head and torso has proven to be one of the most reliable ways to obtain a personalised HRTF. However this requires a dedicated and expensive setup, and is time-intensive. In order to simplify the measurement setup, thereby improving the scalability of the process, we are exploring strategies to reduce the number of acoustic measurements without degrading the spatial resolution of the HRTF. Traditionally, spatial up-sampling of HRTF sets is achieved through barycentric interpolation or by employing the spherical harmonics framework. However, such methods often perform poorly when the provided HRTF data is spatially very sparse. This work investigates the use of generative adversarial networks (GANs) to tackle the up-sampling problem, offering an initial insight about the suitability of this technique. Numerical evaluations based on spectral magnitude error and perceptual model outputs are presented on single spatial dimensions, therefore considering sources positioned only in one of the three main planes: Horizontal, median, and frontal. Results suggest that traditional HRTF interpolation methods perform better than the proposed GAN-based one when the distance between measurements is smaller than 90°, but for the sparsest conditions (i.e., one measurement every 120°–180°), the proposed approach outperforms the others.
使用生成对抗网络的HRTF集空间上采样:一项试点研究
基于耳机的空间音频模拟依赖于头部相关传递函数(hrtf)来重建听者耳入口处的声场。HRTF强烈依赖于听者的特定解剖结构,并且已经证明,用其他人的HRTF重建的虚拟声音会导致更差的定位准确性,以及改变其他主观指标,如外化和真实感。耳朵、头部和躯干产生的过滤效果的声学测量已被证明是获得个性化HRTF的最可靠方法之一。然而,这需要一个专门的和昂贵的设置,并且是耗时的。为了简化测量设置,从而提高过程的可扩展性,我们正在探索在不降低HRTF空间分辨率的情况下减少声学测量次数的策略。传统上,HRTF集的空间上采样是通过质心插值或采用球面谐波框架实现的。但是,当提供的HRTF数据在空间上非常稀疏时,这些方法的性能通常很差。这项工作研究了生成对抗网络(GANs)的使用,以解决上采样问题,提供了关于该技术适用性的初步见解。基于光谱幅度误差和感知模型输出的数值评估在单一空间维度上呈现,因此考虑仅位于三个主要平面之一的源:水平,中位和正面。结果表明,当测量值之间的距离小于90°时,传统的HRTF插值方法比基于gan的插值方法性能更好,但对于最稀疏的条件(即每120°-180°测量一次),本文方法优于其他方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信