Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy

IF 2.4 3区计算机科学

Journal on Audio Speech and Music Processing Pub Date : 2023-10-13 DOI:10.1186/s13636-023-00309-3

Jingtan Li, Mengkai Sun, Zhonghao Zhao, Xingcan Li, Gaigai Li, Chen Wu, Kun Qian, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller

{"title":"Battling with the low-resource condition for snore sound recognition: introducing a meta-learning strategy","authors":"Jingtan Li, Mengkai Sun, Zhonghao Zhao, Xingcan Li, Gaigai Li, Chen Wu, Kun Qian, Bin Hu, Yoshiharu Yamamoto, Björn W. Schuller","doi":"10.1186/s13636-023-00309-3","DOIUrl":null,"url":null,"abstract":"Abstract Snoring affects 57 % of men, 40 % of women, and 27 % of children in the USA. Besides, snoring is highly correlated with obstructive sleep apnoea (OSA), which is characterised by loud and frequent snoring. OSA is also closely associated with various life-threatening diseases such as sudden cardiac arrest and is regarded as a grave medical ailment. Preliminary studies have shown that in the USA, OSA affects over 34 % of men and 14 % of women. In recent years, polysomnography has increasingly been used to diagnose OSA. However, due to its drawbacks such as being time-consuming and costly, intelligent audio analysis of snoring has emerged as an alternative method. Considering the higher demand for identifying the excitation location of snoring in clinical practice, we utilised the Munich-Passau Snore Sound Corpus (MPSSC) snoring database which classifies the snoring excitation location into four categories. Nonetheless, the problem of small samples remains in the MPSSC database due to factors such as privacy concerns and difficulties in accurate labelling. In fact, accurately labelled medical data that can be used for machine learning is often scarce, especially for rare diseases. In view of this, Model-Agnostic Meta-Learning (MAML), a small sample method based on meta-learning, is used to classify snore signals with less resources in this work. The experimental results indicate that even when using only the ESC-50 dataset (non-snoring sound signals) as the data for meta-training, we are able to achieve an unweighted average recall of 60.2 % on the test dataset after fine-tuning on just 36 instances of snoring from the development part of the MPSSC dataset. While our results only exceed the baseline by 4.4 %, they still demonstrate that even with fine-tuning on a few instances of snoring, our model can outperform the baseline. This implies that the MAML algorithm can effectively tackle the low-resource problem even with limited data resources.","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"87 1","pages":"0"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal on Audio Speech and Music Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13636-023-00309-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Snoring affects 57 % of men, 40 % of women, and 27 % of children in the USA. Besides, snoring is highly correlated with obstructive sleep apnoea (OSA), which is characterised by loud and frequent snoring. OSA is also closely associated with various life-threatening diseases such as sudden cardiac arrest and is regarded as a grave medical ailment. Preliminary studies have shown that in the USA, OSA affects over 34 % of men and 14 % of women. In recent years, polysomnography has increasingly been used to diagnose OSA. However, due to its drawbacks such as being time-consuming and costly, intelligent audio analysis of snoring has emerged as an alternative method. Considering the higher demand for identifying the excitation location of snoring in clinical practice, we utilised the Munich-Passau Snore Sound Corpus (MPSSC) snoring database which classifies the snoring excitation location into four categories. Nonetheless, the problem of small samples remains in the MPSSC database due to factors such as privacy concerns and difficulties in accurate labelling. In fact, accurately labelled medical data that can be used for machine learning is often scarce, especially for rare diseases. In view of this, Model-Agnostic Meta-Learning (MAML), a small sample method based on meta-learning, is used to classify snore signals with less resources in this work. The experimental results indicate that even when using only the ESC-50 dataset (non-snoring sound signals) as the data for meta-training, we are able to achieve an unweighted average recall of 60.2 % on the test dataset after fine-tuning on just 36 instances of snoring from the development part of the MPSSC dataset. While our results only exceed the baseline by 4.4 %, they still demonstrate that even with fine-tuning on a few instances of snoring, our model can outperform the baseline. This implies that the MAML algorithm can effectively tackle the low-resource problem even with limited data resources.

查看原文本刊更多论文

解决鼾声识别资源不足的问题:引入元学习策略

在美国，有57%的男性、40%的女性和27%的儿童打鼾。此外，打鼾与阻塞性睡眠呼吸暂停(OSA)高度相关，其特征是大声和频繁的打鼾。阻塞性睡眠呼吸暂停还与各种危及生命的疾病密切相关，如心脏骤停，被认为是一种严重的医学疾病。初步研究表明，在美国，超过34%的男性和14%的女性患有阻塞性睡眠呼吸暂停综合症。近年来，多导睡眠图越来越多地用于OSA的诊断。然而，由于其耗时和昂贵的缺点，智能音频分析打鼾已经成为一种替代方法。考虑到临床实践中对打鼾激发位置识别的更高要求，我们利用慕尼黑-帕绍鼾声语料库(MPSSC)打鼾数据库，将打鼾激发位置分为四类。尽管如此，由于隐私问题和准确标记困难等因素，小样本问题仍然存在于MPSSC数据库中。事实上，可以用于机器学习的准确标记的医疗数据通常是稀缺的，特别是对于罕见疾病。鉴于此，本研究采用基于元学习的小样本方法——模型不可知元学习(Model-Agnostic Meta-Learning, MAML)对资源较少的打鼾信号进行分类。实验结果表明，即使只使用ESC-50数据集(非打鼾声音信号)作为元训练数据，我们也能够在MPSSC数据集开发部分的36个打鼾实例进行微调后，在测试数据集上实现60.2%的未加权平均召回率。虽然我们的结果只超过基线4.4%，但它们仍然表明，即使对一些打鼾的实例进行微调，我们的模型也可以优于基线。这意味着即使在数据资源有限的情况下，MAML算法也能有效地解决低资源问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal on Audio Speech and Music Processing Engineering-Electrical and Electronic Engineering

CiteScore

4.10

自引率

4.20%

发文量

期刊介绍： The aim of “EURASIP Journal on Audio, Speech, and Music Processing” is to bring together researchers, scientists and engineers working on the theory and applications of the processing of various audio signals, with a specific focus on speech and music. EURASIP Journal on Audio, Speech, and Music Processing will be an interdisciplinary journal for the dissemination of all basic and applied aspects of speech communication and audio processes.