COVID-19 Detection Exploiting Self-Supervised Learning Representations of Respiratory Sounds

2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) Pub Date : 2022-09-27 DOI:10.1109/BHI56158.2022.9926967

Adria Mallol-Ragolta, Shuo Liu, B. Schuller

{"title":"COVID-19 Detection Exploiting Self-Supervised Learning Representations of Respiratory Sounds","authors":"Adria Mallol-Ragolta, Shuo Liu, B. Schuller","doi":"10.1109/BHI56158.2022.9926967","DOIUrl":null,"url":null,"abstract":"In this work, we focus on the automatic detection of COVID-19 patients from the analysis of cough, breath, and speech samples. Our goal is to investigate the suitability of Self-Supervised Learning (SSL) representations extracted using Wav2Vec 2.0 for the task at hand. For this, in addition to the SSL representations, the models trained exploit the Low-Level Descriptors (LLD) of the eGeMAPS feature set, and Mel-spectrogram coefficients. The extracted representations are analysed using Convolutional Neural Networks (CNN) reinforced with contextual attention. Our experiments are performed using the data released as part of the Second Diagnosing COVID-19 using Acoustics (DiCOVA) Challenge, and we use the Area Under the Curve (AUC) as the evaluation metric. When using the CNNs without contextual attention, the multi-type model exploiting the SSL Wav2Vec 2.0 representations from the cough, breath, and speech sounds scores the highest AUC, 80.37 %. When reinforcing the embedded representations learnt with contextual attention, the AUC obtained using this same model slightly decreases to 80.01 %. The best performance on the test set is obtained with a multi-type model fusing the embedded representations extracted from the LLDs of the cough, breath, and speech samples and reinforced using contextual attention, scoring an AUC of 81.27 %.","PeriodicalId":347210,"journal":{"name":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BHI56158.2022.9926967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In this work, we focus on the automatic detection of COVID-19 patients from the analysis of cough, breath, and speech samples. Our goal is to investigate the suitability of Self-Supervised Learning (SSL) representations extracted using Wav2Vec 2.0 for the task at hand. For this, in addition to the SSL representations, the models trained exploit the Low-Level Descriptors (LLD) of the eGeMAPS feature set, and Mel-spectrogram coefficients. The extracted representations are analysed using Convolutional Neural Networks (CNN) reinforced with contextual attention. Our experiments are performed using the data released as part of the Second Diagnosing COVID-19 using Acoustics (DiCOVA) Challenge, and we use the Area Under the Curve (AUC) as the evaluation metric. When using the CNNs without contextual attention, the multi-type model exploiting the SSL Wav2Vec 2.0 representations from the cough, breath, and speech sounds scores the highest AUC, 80.37 %. When reinforcing the embedded representations learnt with contextual attention, the AUC obtained using this same model slightly decreases to 80.01 %. The best performance on the test set is obtained with a multi-type model fusing the embedded representations extracted from the LLDs of the cough, breath, and speech samples and reinforced using contextual attention, scoring an AUC of 81.27 %.

查看原文本刊更多论文

利用呼吸声音的自监督学习表征检测COVID-19

在这项工作中，我们专注于从咳嗽、呼吸和语音样本的分析中自动检测COVID-19患者。我们的目标是研究使用Wav2Vec 2.0提取的自监督学习(SSL)表示对手头任务的适用性。为此，除了SSL表示之外，训练的模型还利用了eGeMAPS特征集的低级描述符(LLD)和mel谱图系数。提取的表征使用卷积神经网络(CNN)与上下文注意增强进行分析。我们的实验是使用作为第二次使用声学诊断COVID-19 (DiCOVA)挑战的一部分发布的数据进行的，我们使用曲线下面积(AUC)作为评估指标。当使用没有上下文关注的cnn时，利用来自咳嗽、呼吸和语音的SSL Wav2Vec 2.0表示的多类型模型的AUC得分最高，为80.37%。当用上下文注意强化学习到的嵌入表征时，使用同一模型获得的AUC略有下降至80.01%。多类型模型融合了从咳嗽、呼吸和语音样本的lld中提取的嵌入表征，并使用上下文注意进行强化，在测试集中获得了最佳性能，AUC为81.27%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)

自引率

0.00%

发文量