Cognitively inspired speech processing for multimodal hearing technology

Andrew Abel, A. Hussain, B. Luo
{"title":"Cognitively inspired speech processing for multimodal hearing technology","authors":"Andrew Abel, A. Hussain, B. Luo","doi":"10.1109/CICARE.2014.7007834","DOIUrl":null,"url":null,"abstract":"In recent years, the link between the various human communication production domains has become more widely utilised in the field of speech processing. Work by the authors and others has demonstrated that intelligently integrated audio and visual information can be used for speech enhancement. This advance in technology means that the use of visual information as part of hearing aids or assistive listening devices is becoming ever more viable. One issue that is not commonly explored is how a multimodal system copes with variations in data quality and availability, such as a speaker covering their face while talking, or the existence of multiple speakers in a conversational scenario, an issue that a hearing device would be expected to cope with by switching between different programmes and settings to adapt to changes in the environment. We present the ChallengAV audiovisual corpus, which is used to evaluate a novel fuzzy logic based audiovisual switching system, designed to be used as part of a next-generation adaptive, autonomous, context aware hearing system. Initial results show that the detectors are capable of determining environmental conditions and responding appropriately, demonstrating the potential of such an adaptive multimodal system as part of a state of the art hearing aid device.","PeriodicalId":120730,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence in Healthcare and e-health (CICARE)","volume":"20 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Computational Intelligence in Healthcare and e-health (CICARE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICARE.2014.7007834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In recent years, the link between the various human communication production domains has become more widely utilised in the field of speech processing. Work by the authors and others has demonstrated that intelligently integrated audio and visual information can be used for speech enhancement. This advance in technology means that the use of visual information as part of hearing aids or assistive listening devices is becoming ever more viable. One issue that is not commonly explored is how a multimodal system copes with variations in data quality and availability, such as a speaker covering their face while talking, or the existence of multiple speakers in a conversational scenario, an issue that a hearing device would be expected to cope with by switching between different programmes and settings to adapt to changes in the environment. We present the ChallengAV audiovisual corpus, which is used to evaluate a novel fuzzy logic based audiovisual switching system, designed to be used as part of a next-generation adaptive, autonomous, context aware hearing system. Initial results show that the detectors are capable of determining environmental conditions and responding appropriately, demonstrating the potential of such an adaptive multimodal system as part of a state of the art hearing aid device.
多模态听力技术的认知启发语音处理
近年来,人类各种通信生产领域之间的联系在语音处理领域得到了越来越广泛的应用。作者和其他人的工作已经证明,智能集成的音频和视觉信息可以用于语音增强。技术的进步意味着使用视觉信息作为助听器或辅助听力设备的一部分正变得越来越可行。一个不常被探讨的问题是,多模式系统如何应对数据质量和可用性的变化,例如说话者在说话时遮住脸,或者在对话场景中存在多个说话者,这是听力设备需要通过在不同节目和设置之间切换来应对的问题,以适应环境的变化。我们提出的挑战视听语料库,用于评估一种新的基于模糊逻辑的视听转换系统,该系统被设计为下一代自适应,自主,上下文感知听力系统的一部分。初步结果表明,探测器能够确定环境条件并做出适当的反应,证明了这种自适应多模态系统作为最先进助听器的一部分的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信