COVID-19 Detection Exploiting Self-Supervised Learning Representations of Respiratory Sounds

Adria Mallol-Ragolta, Shuo Liu, B. Schuller
{"title":"COVID-19 Detection Exploiting Self-Supervised Learning Representations of Respiratory Sounds","authors":"Adria Mallol-Ragolta, Shuo Liu, B. Schuller","doi":"10.1109/BHI56158.2022.9926967","DOIUrl":null,"url":null,"abstract":"In this work, we focus on the automatic detection of COVID-19 patients from the analysis of cough, breath, and speech samples. Our goal is to investigate the suitability of Self-Supervised Learning (SSL) representations extracted using Wav2Vec 2.0 for the task at hand. For this, in addition to the SSL representations, the models trained exploit the Low-Level Descriptors (LLD) of the eGeMAPS feature set, and Mel-spectrogram coefficients. The extracted representations are analysed using Convolutional Neural Networks (CNN) reinforced with contextual attention. Our experiments are performed using the data released as part of the Second Diagnosing COVID-19 using Acoustics (DiCOVA) Challenge, and we use the Area Under the Curve (AUC) as the evaluation metric. When using the CNNs without contextual attention, the multi-type model exploiting the SSL Wav2Vec 2.0 representations from the cough, breath, and speech sounds scores the highest AUC, 80.37 %. When reinforcing the embedded representations learnt with contextual attention, the AUC obtained using this same model slightly decreases to 80.01 %. The best performance on the test set is obtained with a multi-type model fusing the embedded representations extracted from the LLDs of the cough, breath, and speech samples and reinforced using contextual attention, scoring an AUC of 81.27 %.","PeriodicalId":347210,"journal":{"name":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BHI56158.2022.9926967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this work, we focus on the automatic detection of COVID-19 patients from the analysis of cough, breath, and speech samples. Our goal is to investigate the suitability of Self-Supervised Learning (SSL) representations extracted using Wav2Vec 2.0 for the task at hand. For this, in addition to the SSL representations, the models trained exploit the Low-Level Descriptors (LLD) of the eGeMAPS feature set, and Mel-spectrogram coefficients. The extracted representations are analysed using Convolutional Neural Networks (CNN) reinforced with contextual attention. Our experiments are performed using the data released as part of the Second Diagnosing COVID-19 using Acoustics (DiCOVA) Challenge, and we use the Area Under the Curve (AUC) as the evaluation metric. When using the CNNs without contextual attention, the multi-type model exploiting the SSL Wav2Vec 2.0 representations from the cough, breath, and speech sounds scores the highest AUC, 80.37 %. When reinforcing the embedded representations learnt with contextual attention, the AUC obtained using this same model slightly decreases to 80.01 %. The best performance on the test set is obtained with a multi-type model fusing the embedded representations extracted from the LLDs of the cough, breath, and speech samples and reinforced using contextual attention, scoring an AUC of 81.27 %.
利用呼吸声音的自监督学习表征检测COVID-19
在这项工作中,我们专注于从咳嗽、呼吸和语音样本的分析中自动检测COVID-19患者。我们的目标是研究使用Wav2Vec 2.0提取的自监督学习(SSL)表示对手头任务的适用性。为此,除了SSL表示之外,训练的模型还利用了eGeMAPS特征集的低级描述符(LLD)和mel谱图系数。提取的表征使用卷积神经网络(CNN)与上下文注意增强进行分析。我们的实验是使用作为第二次使用声学诊断COVID-19 (DiCOVA)挑战的一部分发布的数据进行的,我们使用曲线下面积(AUC)作为评估指标。当使用没有上下文关注的cnn时,利用来自咳嗽、呼吸和语音的SSL Wav2Vec 2.0表示的多类型模型的AUC得分最高,为80.37%。当用上下文注意强化学习到的嵌入表征时,使用同一模型获得的AUC略有下降至80.01%。多类型模型融合了从咳嗽、呼吸和语音样本的lld中提取的嵌入表征,并使用上下文注意进行强化,在测试集中获得了最佳性能,AUC为81.27%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信