Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI:10.1109/ICASSPW59220.2023.10193122

Jasper Kirton-Wingate, Shafique Ahmed, M. Gogate, Yu-sheng Tsao, Amir Hussain

{"title":"Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids","authors":"Jasper Kirton-Wingate, Shafique Ahmed, M. Gogate, Yu-sheng Tsao, Amir Hussain","doi":"10.1109/ICASSPW59220.2023.10193122","DOIUrl":null,"url":null,"abstract":"Since the advent of deep learning (DL), speech enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio and visual information to improve listening comfort (LC). The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. This is used alongside a preference elicitation framework which learns a predictive function to determine the target SNR. The system is novel, scaling the output of an AudioVisual (AV) DL-based SE model to provide HA users with individualised SE. Preliminary results support the hypothesis of improving the overall subjective LC, without significantly impeding the speech intelligibility.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSPW59220.2023.10193122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Since the advent of deep learning (DL), speech enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio and visual information to improve listening comfort (LC). The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. This is used alongside a preference elicitation framework which learns a predictive function to determine the target SNR. The system is novel, scaling the output of an AudioVisual (AV) DL-based SE model to provide HA users with individualised SE. Preliminary results support the hypothesis of improving the overall subjective LC, without significantly impeding the speech intelligibility.

查看原文本刊更多论文

面向个性化语音增强:多模态助听器的信噪比偏好学习系统

自深度学习(DL)出现以来，语音增强(SE)模型在各种噪声条件下都表现良好。然而，这样的系统仍然可能引入声音伪影，声音不自然，并限制用户听到可能重要的环境声音的能力。助听器使用者可根据个人喜好和日常生活方式，定制助听器系统。在本文中，我们为未来的多模态HAs引入了一种基于偏好学习的SE (PLSE)模型，该模型可以上下文化地利用音频和视觉信息来提高听力舒适度(LC)。该系统估计信噪比(SNR)作为一种基本的客观语音质量度量，它量化了语音中存在的背景噪声的相对量，并与信号的可理解性直接相关。这与偏好激发框架一起使用，该框架学习预测函数以确定目标信噪比。该系统新颖，可扩展基于视听(AV) dl的SE模型的输出，为HA用户提供个性化的SE。初步结果支持了在不显著影响语音可理解性的前提下提高整体主观LC的假设。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

自引率

0.00%

发文量