Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids

Jasper Kirton-Wingate, Shafique Ahmed, M. Gogate, Yu-sheng Tsao, Amir Hussain
{"title":"Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids","authors":"Jasper Kirton-Wingate, Shafique Ahmed, M. Gogate, Yu-sheng Tsao, Amir Hussain","doi":"10.1109/ICASSPW59220.2023.10193122","DOIUrl":null,"url":null,"abstract":"Since the advent of deep learning (DL), speech enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio and visual information to improve listening comfort (LC). The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. This is used alongside a preference elicitation framework which learns a predictive function to determine the target SNR. The system is novel, scaling the output of an AudioVisual (AV) DL-based SE model to provide HA users with individualised SE. Preliminary results support the hypothesis of improving the overall subjective LC, without significantly impeding the speech intelligibility.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSPW59220.2023.10193122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Since the advent of deep learning (DL), speech enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio and visual information to improve listening comfort (LC). The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. This is used alongside a preference elicitation framework which learns a predictive function to determine the target SNR. The system is novel, scaling the output of an AudioVisual (AV) DL-based SE model to provide HA users with individualised SE. Preliminary results support the hypothesis of improving the overall subjective LC, without significantly impeding the speech intelligibility.
面向个性化语音增强:多模态助听器的信噪比偏好学习系统
自深度学习(DL)出现以来,语音增强(SE)模型在各种噪声条件下都表现良好。然而,这样的系统仍然可能引入声音伪影,声音不自然,并限制用户听到可能重要的环境声音的能力。助听器使用者可根据个人喜好和日常生活方式,定制助听器系统。在本文中,我们为未来的多模态HAs引入了一种基于偏好学习的SE (PLSE)模型,该模型可以上下文化地利用音频和视觉信息来提高听力舒适度(LC)。该系统估计信噪比(SNR)作为一种基本的客观语音质量度量,它量化了语音中存在的背景噪声的相对量,并与信号的可理解性直接相关。这与偏好激发框架一起使用,该框架学习预测函数以确定目标信噪比。该系统新颖,可扩展基于视听(AV) dl的SE模型的输出,为HA用户提供个性化的SE。初步结果支持了在不显著影响语音可理解性的前提下提高整体主观LC的假设。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信