面向个性化语音增强:多模态助听器的信噪比偏好学习系统

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI:10.1109/ICASSPW59220.2023.10193122

Jasper Kirton-Wingate, Shafique Ahmed, M. Gogate, Yu-sheng Tsao, Amir Hussain

{"title":"面向个性化语音增强:多模态助听器的信噪比偏好学习系统","authors":"Jasper Kirton-Wingate, Shafique Ahmed, M. Gogate, Yu-sheng Tsao, Amir Hussain","doi":"10.1109/ICASSPW59220.2023.10193122","DOIUrl":null,"url":null,"abstract":"Since the advent of deep learning (DL), speech enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio and visual information to improve listening comfort (LC). The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. This is used alongside a preference elicitation framework which learns a predictive function to determine the target SNR. The system is novel, scaling the output of an AudioVisual (AV) DL-based SE model to provide HA users with individualised SE. Preliminary results support the hypothesis of improving the overall subjective LC, without significantly impeding the speech intelligibility.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids\",\"authors\":\"Jasper Kirton-Wingate, Shafique Ahmed, M. Gogate, Yu-sheng Tsao, Amir Hussain\",\"doi\":\"10.1109/ICASSPW59220.2023.10193122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since the advent of deep learning (DL), speech enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio and visual information to improve listening comfort (LC). The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. This is used alongside a preference elicitation framework which learns a predictive function to determine the target SNR. The system is novel, scaling the output of an AudioVisual (AV) DL-based SE model to provide HA users with individualised SE. Preliminary results support the hypothesis of improving the overall subjective LC, without significantly impeding the speech intelligibility.\",\"PeriodicalId\":158726,\"journal\":{\"name\":\"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSPW59220.2023.10193122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSPW59220.2023.10193122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自深度学习(DL)出现以来，语音增强(SE)模型在各种噪声条件下都表现良好。然而，这样的系统仍然可能引入声音伪影，声音不自然，并限制用户听到可能重要的环境声音的能力。助听器使用者可根据个人喜好和日常生活方式，定制助听器系统。在本文中，我们为未来的多模态HAs引入了一种基于偏好学习的SE (PLSE)模型，该模型可以上下文化地利用音频和视觉信息来提高听力舒适度(LC)。该系统估计信噪比(SNR)作为一种基本的客观语音质量度量，它量化了语音中存在的背景噪声的相对量，并与信号的可理解性直接相关。这与偏好激发框架一起使用，该框架学习预测函数以确定目标信噪比。该系统新颖，可扩展基于视听(AV) dl的SE模型的输出，为HA用户提供个性化的SE。初步结果支持了在不显著影响语音可理解性的前提下提高整体主观LC的假设。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Individualised Speech Enhancement: An SNR Preference Learning System for Multi-Modal Hearing Aids

Since the advent of deep learning (DL), speech enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio and visual information to improve listening comfort (LC). The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. This is used alongside a preference elicitation framework which learns a predictive function to determine the target SNR. The system is novel, scaling the output of an AudioVisual (AV) DL-based SE model to provide HA users with individualised SE. Preliminary results support the hypothesis of improving the overall subjective LC, without significantly impeding the speech intelligibility.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

自引率

0.00%

发文量