Cognitively inspired speech processing for multimodal hearing technology

2014 IEEE Symposium on Computational Intelligence in Healthcare and e-health (CICARE) Pub Date : 2014-12-01 DOI:10.1109/CICARE.2014.7007834

Andrew Abel, A. Hussain, B. Luo

{"title":"Cognitively inspired speech processing for multimodal hearing technology","authors":"Andrew Abel, A. Hussain, B. Luo","doi":"10.1109/CICARE.2014.7007834","DOIUrl":null,"url":null,"abstract":"In recent years, the link between the various human communication production domains has become more widely utilised in the field of speech processing. Work by the authors and others has demonstrated that intelligently integrated audio and visual information can be used for speech enhancement. This advance in technology means that the use of visual information as part of hearing aids or assistive listening devices is becoming ever more viable. One issue that is not commonly explored is how a multimodal system copes with variations in data quality and availability, such as a speaker covering their face while talking, or the existence of multiple speakers in a conversational scenario, an issue that a hearing device would be expected to cope with by switching between different programmes and settings to adapt to changes in the environment. We present the ChallengAV audiovisual corpus, which is used to evaluate a novel fuzzy logic based audiovisual switching system, designed to be used as part of a next-generation adaptive, autonomous, context aware hearing system. Initial results show that the detectors are capable of determining environmental conditions and responding appropriately, demonstrating the potential of such an adaptive multimodal system as part of a state of the art hearing aid device.","PeriodicalId":120730,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence in Healthcare and e-health (CICARE)","volume":"20 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Computational Intelligence in Healthcare and e-health (CICARE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICARE.2014.7007834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

In recent years, the link between the various human communication production domains has become more widely utilised in the field of speech processing. Work by the authors and others has demonstrated that intelligently integrated audio and visual information can be used for speech enhancement. This advance in technology means that the use of visual information as part of hearing aids or assistive listening devices is becoming ever more viable. One issue that is not commonly explored is how a multimodal system copes with variations in data quality and availability, such as a speaker covering their face while talking, or the existence of multiple speakers in a conversational scenario, an issue that a hearing device would be expected to cope with by switching between different programmes and settings to adapt to changes in the environment. We present the ChallengAV audiovisual corpus, which is used to evaluate a novel fuzzy logic based audiovisual switching system, designed to be used as part of a next-generation adaptive, autonomous, context aware hearing system. Initial results show that the detectors are capable of determining environmental conditions and responding appropriately, demonstrating the potential of such an adaptive multimodal system as part of a state of the art hearing aid device.

查看原文本刊更多论文

多模态听力技术的认知启发语音处理

近年来，人类各种通信生产领域之间的联系在语音处理领域得到了越来越广泛的应用。作者和其他人的工作已经证明，智能集成的音频和视觉信息可以用于语音增强。技术的进步意味着使用视觉信息作为助听器或辅助听力设备的一部分正变得越来越可行。一个不常被探讨的问题是，多模式系统如何应对数据质量和可用性的变化，例如说话者在说话时遮住脸，或者在对话场景中存在多个说话者，这是听力设备需要通过在不同节目和设置之间切换来应对的问题，以适应环境的变化。我们提出的挑战视听语料库，用于评估一种新的基于模糊逻辑的视听转换系统，该系统被设计为下一代自适应，自主，上下文感知听力系统的一部分。初步结果表明，探测器能够确定环境条件并做出适当的反应，证明了这种自适应多模态系统作为最先进助听器的一部分的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE Symposium on Computational Intelligence in Healthcare and e-health (CICARE)

自引率

0.00%

发文量