Combining active learning and relevance vector machines for text classification

Sixth International Conference on Machine Learning and Applications (ICMLA 2007) Pub Date : 2007-12-13 DOI:10.1109/ICMLA.2007.72

Catarina Silva, Bernardete Ribeiro

引用次数: 1

Abstract

Relevance vector machines (RVM) have proven successful in many learning tasks. However, in large applications, they scale poorly. In many settings there is a large amount of unlabeled data which could be actively chosen by a learner and integrated in the learning procedure. The idea is to improve performance meanwhile reducing costs from data categorization. In this paper we propose an active learning RVM method based on the kernel trick. The underpinning idea is to define a working space between the relevance vectors (RV) initially obtained in a small labeled data set and the new unlabeled examples, where the most informative instances are chosen. By using kernel distance metrics, such a space can be defined and more informative examples can be added to the training set, increasing performance even though the problem dimension is not significantly affected. We detail the proposed method giving illustrative examples in the Reuters-21578 benchmark. Results show performance improvement and scalability.

查看原文本刊更多论文

结合主动学习和相关向量机进行文本分类

相关向量机(RVM)在许多学习任务中已经被证明是成功的。然而，在大型应用程序中，它们的可扩展性很差。在许多情况下，存在大量未标记的数据，这些数据可以由学习者主动选择并整合到学习过程中。其想法是在提高性能的同时减少数据分类的成本。本文提出了一种基于核技巧的主动学习RVM方法。其基本思想是在最初从小标记数据集中获得的相关向量(RV)和新的未标记示例之间定义一个工作空间，其中选择信息最多的实例。通过使用核距离度量，可以定义这样的空间，并且可以将更多信息丰富的示例添加到训练集中，从而提高性能，即使问题维度没有受到显着影响。我们在Reuters-21578基准中给出了示例，详细介绍了所提出的方法。结果显示了性能改进和可伸缩性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Sixth International Conference on Machine Learning and Applications (ICMLA 2007)

自引率

0.00%

发文量