文本分类多视图半监督主动学习的语义特征

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI:10.1109/ICDMW.2008.13

Shiliang Sun

{"title":"文本分类多视图半监督主动学习的语义特征","authors":"Shiliang Sun","doi":"10.1109/ICDMW.2008.13","DOIUrl":null,"url":null,"abstract":"For multi-view learning, existing methods usually exploit originally provided features for classifier training, which ignore the latent correlation between different views. In this paper, semantic features integrating information from multiple views are extracted for pattern representation. Canonical correlation analysis is used to learn the representation of semantic spaces where semantic features are projections of original features on the basis vectors of the spaces. We investigate the feasibility of semantic features on two learning paradigms: semi-supervised learning and active learning. Experiments on text classification with two state-of-the-art multi-view learning algorithms co-training and co-testing indicate that this use of semantic features can lead to a significant improvement of performance.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"88 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Semantic Features for Multi-view Semi-supervised and Active Learning of Text Classification\",\"authors\":\"Shiliang Sun\",\"doi\":\"10.1109/ICDMW.2008.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For multi-view learning, existing methods usually exploit originally provided features for classifier training, which ignore the latent correlation between different views. In this paper, semantic features integrating information from multiple views are extracted for pattern representation. Canonical correlation analysis is used to learn the representation of semantic spaces where semantic features are projections of original features on the basis vectors of the spaces. We investigate the feasibility of semantic features on two learning paradigms: semi-supervised learning and active learning. Experiments on text classification with two state-of-the-art multi-view learning algorithms co-training and co-testing indicate that this use of semantic features can lead to a significant improvement of performance.\",\"PeriodicalId\":175955,\"journal\":{\"name\":\"2008 IEEE International Conference on Data Mining Workshops\",\"volume\":\"88 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Data Mining Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2008.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2008.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

对于多视图学习，现有方法通常利用原有的特征进行分类器训练，忽略了不同视图之间的潜在相关性。本文从多个视图中提取集成信息的语义特征进行模式表示。典型相关分析用于学习语义空间的表示，其中语义特征是原始特征在空间基向量上的投影。研究了语义特征在半监督学习和主动学习两种学习范式上的可行性。在两种最先进的多视图学习算法的文本分类实验中，共同训练和共同测试表明，这种语义特征的使用可以显著提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic Features for Multi-view Semi-supervised and Active Learning of Text Classification

For multi-view learning, existing methods usually exploit originally provided features for classifier training, which ignore the latent correlation between different views. In this paper, semantic features integrating information from multiple views are extracted for pattern representation. Canonical correlation analysis is used to learn the representation of semantic spaces where semantic features are projections of original features on the basis vectors of the spaces. We investigate the feasibility of semantic features on two learning paradigms: semi-supervised learning and active learning. Experiments on text classification with two state-of-the-art multi-view learning algorithms co-training and co-testing indicate that this use of semantic features can lead to a significant improvement of performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量