通过代表性和多样性样本选择加强半监督学习

Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu
{"title":"通过代表性和多样性样本选择加强半监督学习","authors":"Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu","doi":"arxiv-2409.11653","DOIUrl":null,"url":null,"abstract":"Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep\nlearning tasks, which reduces the need for human labor. Previous studies\nprimarily focus on effectively utilising the labelled and unlabeled data to\nimprove performance. However, we observe that how to select samples for\nlabelling also significantly impacts performance, particularly under extremely\nlow-budget settings. The sample selection task in SSL has been under-explored\nfor a long time. To fill in this gap, we propose a Representative and Diverse\nSample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm\nto minimise a novel criterion $\\alpha$-Maximum Mean Discrepancy ($\\alpha$-MMD),\nRDSS samples a representative and diverse subset for annotation from the\nunlabeled data. We demonstrate that minimizing $\\alpha$-MMD enhances the\ngeneralization ability of low-budget learning. Experimental results show that\nRDSS consistently improves the performance of several popular SSL frameworks\nand outperforms the state-of-the-art sample selection approaches used in Active\nLearning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained\nannotation budgets.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection\",\"authors\":\"Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu\",\"doi\":\"arxiv-2409.11653\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep\\nlearning tasks, which reduces the need for human labor. Previous studies\\nprimarily focus on effectively utilising the labelled and unlabeled data to\\nimprove performance. However, we observe that how to select samples for\\nlabelling also significantly impacts performance, particularly under extremely\\nlow-budget settings. The sample selection task in SSL has been under-explored\\nfor a long time. To fill in this gap, we propose a Representative and Diverse\\nSample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm\\nto minimise a novel criterion $\\\\alpha$-Maximum Mean Discrepancy ($\\\\alpha$-MMD),\\nRDSS samples a representative and diverse subset for annotation from the\\nunlabeled data. We demonstrate that minimizing $\\\\alpha$-MMD enhances the\\ngeneralization ability of low-budget learning. Experimental results show that\\nRDSS consistently improves the performance of several popular SSL frameworks\\nand outperforms the state-of-the-art sample selection approaches used in Active\\nLearning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained\\nannotation budgets.\",\"PeriodicalId\":501301,\"journal\":{\"name\":\"arXiv - CS - Machine Learning\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11653\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

半监督学习(SSL)已成为许多深度学习任务的首选范式,它减少了对人力的需求。以往的研究主要集中在有效利用标记数据和未标记数据来提高性能。然而,我们发现,如何选择标记样本也会对性能产生重大影响,尤其是在预算极低的情况下。长期以来,SSL 中的样本选择任务一直未得到充分探索。为了填补这一空白,我们提出了一种代表性和多样性样本选择方法(RDSS)。通过采用改进的弗兰克-沃尔夫算法(Frank-Wolfe algorithm)来最小化一个新标准($\alpha$-Maximum Mean Discrepancy ($\alpha$-MMD)),RDSS从未标明的数据中采样出一个具有代表性和多样性的注释子集。我们证明,最小化$\alpha$-MMD可以增强低预算学习的泛化能力。实验结果表明,即使在标注预算受限的情况下,RDSS 也能持续提高几种流行的 SSL 框架的性能,并优于主动学习(ActiveLearning,AL)和半监督主动学习(Semi-Supervised Active Learning,SSAL)中使用的最先进的样本选择方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection
Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep learning tasks, which reduces the need for human labor. Previous studies primarily focus on effectively utilising the labelled and unlabeled data to improve performance. However, we observe that how to select samples for labelling also significantly impacts performance, particularly under extremely low-budget settings. The sample selection task in SSL has been under-explored for a long time. To fill in this gap, we propose a Representative and Diverse Sample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm to minimise a novel criterion $\alpha$-Maximum Mean Discrepancy ($\alpha$-MMD), RDSS samples a representative and diverse subset for annotation from the unlabeled data. We demonstrate that minimizing $\alpha$-MMD enhances the generalization ability of low-budget learning. Experimental results show that RDSS consistently improves the performance of several popular SSL frameworks and outperforms the state-of-the-art sample selection approaches used in Active Learning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained annotation budgets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信