稀疏贝叶斯多标签分类的主动学习

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2014-08-24 DOI:10.1145/2623330.2623759

Deepak Vasisht, Andreas C. Damianou, M. Varma, Ashish Kapoor

{"title":"稀疏贝叶斯多标签分类的主动学习","authors":"Deepak Vasisht, Andreas C. Damianou, M. Varma, Ashish Kapoor","doi":"10.1145/2623330.2623759","DOIUrl":null,"url":null,"abstract":"We study the problem of active learning for multilabel classification. We focus on the real-world scenario where the average number of positive (relevant) labels per data point is small leading to positive label sparsity. Carrying out mutual information based near-optimal active learning in this setting is a challenging task since the computational complexity involved is exponential in the total number of labels. We propose a novel inference algorithm for the sparse Bayesian multilabel model of [17]. The benefit of this alternate inference scheme is that it enables a natural approximation of the mutual information objective. We prove that the approximation leads to an identical solution to the exact optimization problem but at a fraction of the optimization cost. This allows us to carry out efficient, non-myopic, and near-optimal active learning for sparse multilabel classification. Extensive experiments reveal the effectiveness of the method.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":"{\"title\":\"Active learning for sparse bayesian multilabel classification\",\"authors\":\"Deepak Vasisht, Andreas C. Damianou, M. Varma, Ashish Kapoor\",\"doi\":\"10.1145/2623330.2623759\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the problem of active learning for multilabel classification. We focus on the real-world scenario where the average number of positive (relevant) labels per data point is small leading to positive label sparsity. Carrying out mutual information based near-optimal active learning in this setting is a challenging task since the computational complexity involved is exponential in the total number of labels. We propose a novel inference algorithm for the sparse Bayesian multilabel model of [17]. The benefit of this alternate inference scheme is that it enables a natural approximation of the mutual information objective. We prove that the approximation leads to an identical solution to the exact optimization problem but at a fraction of the optimization cost. This allows us to carry out efficient, non-myopic, and near-optimal active learning for sparse multilabel classification. Extensive experiments reveal the effectiveness of the method.\",\"PeriodicalId\":20536,\"journal\":{\"name\":\"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"69\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2623330.2623759\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2623330.2623759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 69

摘要

研究了多标签分类的主动学习问题。我们关注的是现实世界的场景，其中每个数据点的正(相关)标签的平均数量很小，导致正标签稀疏性。在这种情况下，执行基于互信息的近最优主动学习是一项具有挑战性的任务，因为所涉及的计算复杂度在标签总数中呈指数级增长。针对[17]的稀疏贝叶斯多标签模型，提出了一种新的推理算法。这种替代推理方案的好处是，它使相互信息目标的自然逼近成为可能。我们证明，近似导致一个完全相同的解决方案的优化问题，但在一小部分的优化成本。这使我们能够对稀疏多标签分类进行高效、非短视和接近最优的主动学习。大量的实验证明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Active learning for sparse bayesian multilabel classification

We study the problem of active learning for multilabel classification. We focus on the real-world scenario where the average number of positive (relevant) labels per data point is small leading to positive label sparsity. Carrying out mutual information based near-optimal active learning in this setting is a challenging task since the computational complexity involved is exponential in the total number of labels. We propose a novel inference algorithm for the sparse Bayesian multilabel model of [17]. The benefit of this alternate inference scheme is that it enables a natural approximation of the mutual information objective. We prove that the approximation leads to an identical solution to the exact optimization problem but at a fraction of the optimization cost. This allows us to carry out efficient, non-myopic, and near-optimal active learning for sparse multilabel classification. Extensive experiments reveal the effectiveness of the method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量