{"title":"Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection","authors":"Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu","doi":"arxiv-2409.11653","DOIUrl":null,"url":null,"abstract":"Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep\nlearning tasks, which reduces the need for human labor. Previous studies\nprimarily focus on effectively utilising the labelled and unlabeled data to\nimprove performance. However, we observe that how to select samples for\nlabelling also significantly impacts performance, particularly under extremely\nlow-budget settings. The sample selection task in SSL has been under-explored\nfor a long time. To fill in this gap, we propose a Representative and Diverse\nSample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm\nto minimise a novel criterion $\\alpha$-Maximum Mean Discrepancy ($\\alpha$-MMD),\nRDSS samples a representative and diverse subset for annotation from the\nunlabeled data. We demonstrate that minimizing $\\alpha$-MMD enhances the\ngeneralization ability of low-budget learning. Experimental results show that\nRDSS consistently improves the performance of several popular SSL frameworks\nand outperforms the state-of-the-art sample selection approaches used in Active\nLearning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained\nannotation budgets.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep
learning tasks, which reduces the need for human labor. Previous studies
primarily focus on effectively utilising the labelled and unlabeled data to
improve performance. However, we observe that how to select samples for
labelling also significantly impacts performance, particularly under extremely
low-budget settings. The sample selection task in SSL has been under-explored
for a long time. To fill in this gap, we propose a Representative and Diverse
Sample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm
to minimise a novel criterion $\alpha$-Maximum Mean Discrepancy ($\alpha$-MMD),
RDSS samples a representative and diverse subset for annotation from the
unlabeled data. We demonstrate that minimizing $\alpha$-MMD enhances the
generalization ability of low-budget learning. Experimental results show that
RDSS consistently improves the performance of several popular SSL frameworks
and outperforms the state-of-the-art sample selection approaches used in Active
Learning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained
annotation budgets.