基于种子亲和性传播和一致性算法的半监督文档聚类

R. Radha, T. T. Mirnalinee, T. Trueman
{"title":"基于种子亲和性传播和一致性算法的半监督文档聚类","authors":"R. Radha, T. T. Mirnalinee, T. Trueman","doi":"10.1109/ICRTIT.2012.6206802","DOIUrl":null,"url":null,"abstract":"Domain adaptation is the process of transferring the knowledge to a different domain from a source domain but they are related. In this paper, we first apply `Consensus Regularization' based algorithm to merge multiple source domain to a single source domain. Then we propose multi-domain adaptation in document clustering using Seeds affinity propagation and Consensus Regularization Algorithm. A semi-supervised document clustering algorithm, called Seeds Affinity Propagation (SAP) is applied based on an effective clustering algorithm Affinity Propagation (AP). The labeled and unlabeled documents are preprocessed through various processes such as stop words removal, word stemming and finding word frequency and given as the input. After pre-processing, structured documents are obtained. Tri-set Computation, a feature extraction technique is used to find out the features through Co-feature set, Unilateral feature set and Significant Co-feature set methods. Then calculate the similarity measure of the documents and assigning the label to the documents if they are matched. Finally clustered documents are obtained through seeds affinity propagation via similarity measurement. Further the performance of the algorithm can be evaluated and improved.","PeriodicalId":191151,"journal":{"name":"2012 International Conference on Recent Trends in Information Technology","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Semi-supervised document clustering using Seeds affinity propagation and consensus algorithm in multi-domain settings\",\"authors\":\"R. Radha, T. T. Mirnalinee, T. Trueman\",\"doi\":\"10.1109/ICRTIT.2012.6206802\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Domain adaptation is the process of transferring the knowledge to a different domain from a source domain but they are related. In this paper, we first apply `Consensus Regularization' based algorithm to merge multiple source domain to a single source domain. Then we propose multi-domain adaptation in document clustering using Seeds affinity propagation and Consensus Regularization Algorithm. A semi-supervised document clustering algorithm, called Seeds Affinity Propagation (SAP) is applied based on an effective clustering algorithm Affinity Propagation (AP). The labeled and unlabeled documents are preprocessed through various processes such as stop words removal, word stemming and finding word frequency and given as the input. After pre-processing, structured documents are obtained. Tri-set Computation, a feature extraction technique is used to find out the features through Co-feature set, Unilateral feature set and Significant Co-feature set methods. Then calculate the similarity measure of the documents and assigning the label to the documents if they are matched. Finally clustered documents are obtained through seeds affinity propagation via similarity measurement. Further the performance of the algorithm can be evaluated and improved.\",\"PeriodicalId\":191151,\"journal\":{\"name\":\"2012 International Conference on Recent Trends in Information Technology\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on Recent Trends in Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRTIT.2012.6206802\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Recent Trends in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRTIT.2012.6206802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

领域适应是将知识从一个源领域转移到另一个相关领域的过程。在本文中,我们首先应用基于“共识正则化”的算法将多个源域合并为单个源域。在此基础上,提出了基于种子亲和性传播和一致性正则化算法的多领域自适应文档聚类算法。在有效聚类算法Affinity Propagation (AP)的基础上,提出了一种半监督文档聚类算法seed Affinity Propagation (SAP)。对标记和未标记的文档进行预处理,通过去除停止词、词干提取和查找词频等各种过程,并将其作为输入。预处理后,得到结构化文档。三集计算是一种特征提取技术,通过共同特征集、单边特征集和显著共同特征集方法来找出特征。然后计算文档的相似度度量,并在匹配的情况下为文档分配标签。最后通过相似度度量,通过种子亲和传播获得聚类文档。进一步评价和改进算法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Semi-supervised document clustering using Seeds affinity propagation and consensus algorithm in multi-domain settings
Domain adaptation is the process of transferring the knowledge to a different domain from a source domain but they are related. In this paper, we first apply `Consensus Regularization' based algorithm to merge multiple source domain to a single source domain. Then we propose multi-domain adaptation in document clustering using Seeds affinity propagation and Consensus Regularization Algorithm. A semi-supervised document clustering algorithm, called Seeds Affinity Propagation (SAP) is applied based on an effective clustering algorithm Affinity Propagation (AP). The labeled and unlabeled documents are preprocessed through various processes such as stop words removal, word stemming and finding word frequency and given as the input. After pre-processing, structured documents are obtained. Tri-set Computation, a feature extraction technique is used to find out the features through Co-feature set, Unilateral feature set and Significant Co-feature set methods. Then calculate the similarity measure of the documents and assigning the label to the documents if they are matched. Finally clustered documents are obtained through seeds affinity propagation via similarity measurement. Further the performance of the algorithm can be evaluated and improved.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信