A novel extension of FixMatch using uncertainty for semi-supervised audio classification

Sascha Grollmisch , Estefanía Cano , Hanna Lukashevich , Jakob Abeßer
{"title":"A novel extension of FixMatch using uncertainty for semi-supervised audio classification","authors":"Sascha Grollmisch ,&nbsp;Estefanía Cano ,&nbsp;Hanna Lukashevich ,&nbsp;Jakob Abeßer","doi":"10.1016/j.sctalk.2024.100364","DOIUrl":null,"url":null,"abstract":"<div><p>Semi-supervised learning (SSL) is a commonly used technique when annotated data is scarce but unlabeled data is easily available. In recent years, SSL has seen a large boost in the computer vision domain and methods such as FixMatch were successfully adapted to audio classification tasks. However, there still remains a gap between SSL methods and the fully supervised baselines, which were trained with all labels available. In this work, we first investigate the quality of the pseudo-labels, i.e., generated labels for unlabeled data, for musical instrument family classification and acoustic scene classification. Based on these insights, we propose and evaluate a novel extension of FixMatch that quantifies and considers the uncertainty of the pseudo-labels. Additionally, we highlight the problematic tradeoff between pseudo-label quality and quantity. Our results show that Monte-Carlo Dropout combined with temperature scaling improved the pseudo-label accuracy from 78.4% to 86.7% for instrument family and from 87.9% to 89.9% for acoustic scene classification. Even though the accuracy on the test sets improved from 71.0% to 72.1% and from 69.2% to 70.8%, respectively, there is still a gap to the fully supervised baseline leaving room for future work.</p></div>","PeriodicalId":101148,"journal":{"name":"Science Talks","volume":"10 ","pages":"Article 100364"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772569324000720/pdfft?md5=71e508d40caa26eb0c2cde9d66bc9567&pid=1-s2.0-S2772569324000720-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science Talks","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772569324000720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Semi-supervised learning (SSL) is a commonly used technique when annotated data is scarce but unlabeled data is easily available. In recent years, SSL has seen a large boost in the computer vision domain and methods such as FixMatch were successfully adapted to audio classification tasks. However, there still remains a gap between SSL methods and the fully supervised baselines, which were trained with all labels available. In this work, we first investigate the quality of the pseudo-labels, i.e., generated labels for unlabeled data, for musical instrument family classification and acoustic scene classification. Based on these insights, we propose and evaluate a novel extension of FixMatch that quantifies and considers the uncertainty of the pseudo-labels. Additionally, we highlight the problematic tradeoff between pseudo-label quality and quantity. Our results show that Monte-Carlo Dropout combined with temperature scaling improved the pseudo-label accuracy from 78.4% to 86.7% for instrument family and from 87.9% to 89.9% for acoustic scene classification. Even though the accuracy on the test sets improved from 71.0% to 72.1% and from 69.2% to 70.8%, respectively, there is still a gap to the fully supervised baseline leaving room for future work.

利用不确定性对 FixMatch 进行新的扩展,用于半监督音频分类
半监督学习(SSL)是一种常用的技术,适用于标注数据稀缺但未标注数据容易获得的情况。近年来,半监督学习在计算机视觉领域得到了广泛应用,FixMatch 等方法被成功应用于音频分类任务。然而,SSL 方法与完全监督基线方法之间仍然存在差距,后者是在所有可用标签的基础上进行训练的。在这项工作中,我们首先研究了伪标签(即为无标签数据生成的标签)在乐器族分类和声学场景分类中的质量。基于这些见解,我们提出并评估了 FixMatch 的新扩展,该扩展量化并考虑了伪标签的不确定性。此外,我们还强调了伪标签质量和数量之间的权衡问题。我们的结果表明,Monte-Carlo Dropout 结合温度缩放提高了伪标签的准确率,乐器系列从 78.4% 提高到 86.7%,声学场景分类从 87.9% 提高到 89.9%。尽管测试集的准确率分别从 71.0% 提高到 72.1%,从 69.2% 提高到 70.8%,但与完全监督基线相比仍有差距,这为今后的工作留下了空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信