Combining active learning and relevance vector machines for text classification

Catarina Silva, Bernardete Ribeiro
{"title":"Combining active learning and relevance vector machines for text classification","authors":"Catarina Silva, Bernardete Ribeiro","doi":"10.1109/ICMLA.2007.72","DOIUrl":null,"url":null,"abstract":"Relevance vector machines (RVM) have proven successful in many learning tasks. However, in large applications, they scale poorly. In many settings there is a large amount of unlabeled data which could be actively chosen by a learner and integrated in the learning procedure. The idea is to improve performance meanwhile reducing costs from data categorization. In this paper we propose an active learning RVM method based on the kernel trick. The underpinning idea is to define a working space between the relevance vectors (RV) initially obtained in a small labeled data set and the new unlabeled examples, where the most informative instances are chosen. By using kernel distance metrics, such a space can be defined and more informative examples can be added to the training set, increasing performance even though the problem dimension is not significantly affected. We detail the proposed method giving illustrative examples in the Reuters-21578 benchmark. Results show performance improvement and scalability.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Relevance vector machines (RVM) have proven successful in many learning tasks. However, in large applications, they scale poorly. In many settings there is a large amount of unlabeled data which could be actively chosen by a learner and integrated in the learning procedure. The idea is to improve performance meanwhile reducing costs from data categorization. In this paper we propose an active learning RVM method based on the kernel trick. The underpinning idea is to define a working space between the relevance vectors (RV) initially obtained in a small labeled data set and the new unlabeled examples, where the most informative instances are chosen. By using kernel distance metrics, such a space can be defined and more informative examples can be added to the training set, increasing performance even though the problem dimension is not significantly affected. We detail the proposed method giving illustrative examples in the Reuters-21578 benchmark. Results show performance improvement and scalability.
结合主动学习和相关向量机进行文本分类
相关向量机(RVM)在许多学习任务中已经被证明是成功的。然而,在大型应用程序中,它们的可扩展性很差。在许多情况下,存在大量未标记的数据,这些数据可以由学习者主动选择并整合到学习过程中。其想法是在提高性能的同时减少数据分类的成本。本文提出了一种基于核技巧的主动学习RVM方法。其基本思想是在最初从小标记数据集中获得的相关向量(RV)和新的未标记示例之间定义一个工作空间,其中选择信息最多的实例。通过使用核距离度量,可以定义这样的空间,并且可以将更多信息丰富的示例添加到训练集中,从而提高性能,即使问题维度没有受到显着影响。我们在Reuters-21578基准中给出了示例,详细介绍了所提出的方法。结果显示了性能改进和可伸缩性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信