Learning with Inadequate and Incorrect Supervision

Chen Gong, Hengmin Zhang, Jian Yang, D. Tao
{"title":"Learning with Inadequate and Incorrect Supervision","authors":"Chen Gong, Hengmin Zhang, Jian Yang, D. Tao","doi":"10.1109/ICDM.2017.110","DOIUrl":null,"url":null,"abstract":"Practically, we are often in the dilemma that the labeled data at hand are inadequate to train a reliable classifier, and more seriously, some of these labeled data may be mistakenly labeled due to the various human factors. Therefore, this paper proposes a novel semi-supervised learning paradigm that can handle both label insufficiency and label inaccuracy. To address label insufficiency, we use a graph to bridge the data points so that the label information can be propagated from the scarce labeled examples to unlabeled examples along the graph edges. To address label inaccuracy, Graph Trend Filtering (GTF) and Smooth Eigenbase Pursuit (SEP) are adopted to filter out the initial noisy labels. GTF penalizes the l_0 norm of label difference between connected examples in the graph and exhibits better local adaptivity than the traditional l_2 norm-based Laplacian smoother. SEP reconstructs the correct labels by emphasizing the leading eigenvectors of Laplacian matrix associated with small eigenvalues, as these eigenvectors reflect real label smoothness and carry rich class separation cues. We term our algorithm as \"Semi-supervised learning under Inadequate and Incorrect Supervision\" (SIIS). Thorough experimental results on image classification, text categorization, and speech recognition demonstrate that our SIIS is effective in label error correction, leading to superior performance to the state-of-the-art methods in the presence of label noise and label scarcity.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2017.110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

Abstract

Practically, we are often in the dilemma that the labeled data at hand are inadequate to train a reliable classifier, and more seriously, some of these labeled data may be mistakenly labeled due to the various human factors. Therefore, this paper proposes a novel semi-supervised learning paradigm that can handle both label insufficiency and label inaccuracy. To address label insufficiency, we use a graph to bridge the data points so that the label information can be propagated from the scarce labeled examples to unlabeled examples along the graph edges. To address label inaccuracy, Graph Trend Filtering (GTF) and Smooth Eigenbase Pursuit (SEP) are adopted to filter out the initial noisy labels. GTF penalizes the l_0 norm of label difference between connected examples in the graph and exhibits better local adaptivity than the traditional l_2 norm-based Laplacian smoother. SEP reconstructs the correct labels by emphasizing the leading eigenvectors of Laplacian matrix associated with small eigenvalues, as these eigenvectors reflect real label smoothness and carry rich class separation cues. We term our algorithm as "Semi-supervised learning under Inadequate and Incorrect Supervision" (SIIS). Thorough experimental results on image classification, text categorization, and speech recognition demonstrate that our SIIS is effective in label error correction, leading to superior performance to the state-of-the-art methods in the presence of label noise and label scarcity.
监督不充分和不正确的学习
在实践中,我们经常会遇到这样的困境:手头的标注数据不足以训练出一个可靠的分类器,更严重的是,这些标注数据中的一些可能会因为各种人为因素而被错误地标注。因此,本文提出了一种新的半监督学习范式,可以同时处理标签不足和标签不准确。为了解决标签不足的问题,我们使用图来桥接数据点,以便标签信息可以沿着图边缘从稀缺的标记示例传播到未标记的示例。为了解决标签不准确的问题,采用了图趋势滤波(GTF)和平滑特征基追踪(SEP)来过滤掉初始噪声标签。GTF对图中连通样例之间标签差的l_0范数进行惩罚,比传统的基于l_2范数的拉普拉斯平滑具有更好的局部适应性。SEP通过强调与小特征值相关的拉普拉斯矩阵的前导特征向量来重建正确的标签,因为这些特征向量反映了真正的标签平滑性,并且携带了丰富的类分离线索。我们将我们的算法称为“不充分和不正确监督下的半监督学习”(SIIS)。在图像分类、文本分类和语音识别方面的全面实验结果表明,我们的SIIS在标签纠错方面是有效的,在标签噪声和标签稀缺的情况下,其性能优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信