用于通用 COVID-19 检测的频谱-时序突出掩码和调制张量图

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yi Zhu, Tiago H. Falk
{"title":"用于通用 COVID-19 检测的频谱-时序突出掩码和调制张量图","authors":"Yi Zhu,&nbsp;Tiago H. Falk","doi":"10.1016/j.csl.2024.101620","DOIUrl":null,"url":null,"abstract":"<div><p>Speech COVID-19 detection systems have gained popularity as they represent an easy-to-use and low-cost solution that is well suited for at-home long-term monitoring of patients with persistent symptoms. Recently, however, the limited generalization capability of existing deep neural network based systems to unseen datasets has been raised as a serious concern, as has their limited interpretability. In this study, we aim to develop an interpretable and generalizable speech-based COVID-19 detection system. First, we propose the use of a 3-dimensional modulation frequency tensor (called modulation tensorgram representation, MTR) as input to a convolutional recurrent neural network for COVID-19 detection. The MTR representation is known to capture long-term dynamics of speech correlated with articulation and respiration, hence being a potential candidate for characterizing COVID-19 speech. The customized network explores both the spectral and temporal pattern from MTR to learn the underlying COVID-19 speech pattern. Next, we design a spectro-temporal saliency masking to aggregate regions of the MTR related to COVID-19, thus helping further improve the generalizability and interpretability of the model. Experiments are conducted on three public datasets and results show the proposed solution consistently outperforming two benchmark systems in within-, across-, and unseen-dataset tests. The learned salient regions have been shown correlated with whispered speech and vocal hoarseness, which explains the increased generalizability. Furthermore, our model relies on a small amount of parameters, thus offering a promising solution for on-device remote monitoring of COVID-19 infection.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"86 ","pages":"Article 101620"},"PeriodicalIF":3.1000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000032/pdfft?md5=e39e0b3ee7ea45c5b9c50622ff48dbd4&pid=1-s2.0-S0885230824000032-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Spectral–temporal saliency masks and modulation tensorgrams for generalizable COVID-19 detection\",\"authors\":\"Yi Zhu,&nbsp;Tiago H. Falk\",\"doi\":\"10.1016/j.csl.2024.101620\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Speech COVID-19 detection systems have gained popularity as they represent an easy-to-use and low-cost solution that is well suited for at-home long-term monitoring of patients with persistent symptoms. Recently, however, the limited generalization capability of existing deep neural network based systems to unseen datasets has been raised as a serious concern, as has their limited interpretability. In this study, we aim to develop an interpretable and generalizable speech-based COVID-19 detection system. First, we propose the use of a 3-dimensional modulation frequency tensor (called modulation tensorgram representation, MTR) as input to a convolutional recurrent neural network for COVID-19 detection. The MTR representation is known to capture long-term dynamics of speech correlated with articulation and respiration, hence being a potential candidate for characterizing COVID-19 speech. The customized network explores both the spectral and temporal pattern from MTR to learn the underlying COVID-19 speech pattern. Next, we design a spectro-temporal saliency masking to aggregate regions of the MTR related to COVID-19, thus helping further improve the generalizability and interpretability of the model. Experiments are conducted on three public datasets and results show the proposed solution consistently outperforming two benchmark systems in within-, across-, and unseen-dataset tests. The learned salient regions have been shown correlated with whispered speech and vocal hoarseness, which explains the increased generalizability. Furthermore, our model relies on a small amount of parameters, thus offering a promising solution for on-device remote monitoring of COVID-19 infection.</p></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"86 \",\"pages\":\"Article 101620\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0885230824000032/pdfft?md5=e39e0b3ee7ea45c5b9c50622ff48dbd4&pid=1-s2.0-S0885230824000032-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230824000032\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000032","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

语音 COVID-19 检测系统是一种易于使用且成本低廉的解决方案,非常适合在家中对有持续症状的患者进行长期监测,因此广受欢迎。但最近,现有基于深度神经网络的系统对未见数据集的泛化能力有限以及可解释性有限的问题引起了人们的严重关注。在本研究中,我们旨在开发一种可解释、可泛化的基于语音的 COVID-19 检测系统。首先,我们建议使用三维调制频率张量(称为调制张量图表示法,MTR)作为卷积递归神经网络的输入,用于 COVID-19 检测。众所周知,MTR 表示法能捕捉与发音和呼吸相关的语音长期动态,因此是描述 COVID-19 语音特征的潜在候选方法。定制网络从 MTR 中探索频谱和时间模式,以学习 COVID-19 的基本语音模式。接下来,我们设计了一种频谱-时间显著性掩蔽,以聚合 MTR 中与 COVID-19 相关的区域,从而有助于进一步提高模型的通用性和可解释性。实验在三个公共数据集上进行,结果表明所提出的解决方案在内部、跨数据集和未见数据集测试中的表现始终优于两个基准系统。实验结果表明,所学的突出区域与耳语语音和声音嘶哑相关,这也是通用性提高的原因。此外,我们的模型只需少量参数,因此为设备远程监控 COVID-19 感染提供了一个很有前景的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spectral–temporal saliency masks and modulation tensorgrams for generalizable COVID-19 detection

Speech COVID-19 detection systems have gained popularity as they represent an easy-to-use and low-cost solution that is well suited for at-home long-term monitoring of patients with persistent symptoms. Recently, however, the limited generalization capability of existing deep neural network based systems to unseen datasets has been raised as a serious concern, as has their limited interpretability. In this study, we aim to develop an interpretable and generalizable speech-based COVID-19 detection system. First, we propose the use of a 3-dimensional modulation frequency tensor (called modulation tensorgram representation, MTR) as input to a convolutional recurrent neural network for COVID-19 detection. The MTR representation is known to capture long-term dynamics of speech correlated with articulation and respiration, hence being a potential candidate for characterizing COVID-19 speech. The customized network explores both the spectral and temporal pattern from MTR to learn the underlying COVID-19 speech pattern. Next, we design a spectro-temporal saliency masking to aggregate regions of the MTR related to COVID-19, thus helping further improve the generalizability and interpretability of the model. Experiments are conducted on three public datasets and results show the proposed solution consistently outperforming two benchmark systems in within-, across-, and unseen-dataset tests. The learned salient regions have been shown correlated with whispered speech and vocal hoarseness, which explains the increased generalizability. Furthermore, our model relies on a small amount of parameters, thus offering a promising solution for on-device remote monitoring of COVID-19 infection.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信