{"title":"基于融合多特征的文档聚类协同训练方法","authors":"Yuanqing Wang, Wenjun Wang, Weidi Dai, Pengfei Jiao, Wei Yu","doi":"10.1109/ICISCE.2016.19","DOIUrl":null,"url":null,"abstract":"Document clustering is a popular topic in data mining and information retrieval. Most models and methods for this problem are based on computing the similarity between pair documents modeled in a space of all terms, or a new feature space obtained by applying a topic modeling technique for a given corpus. In this paper, we regard these two ideas as clustering on term feature and on semantic feature, and have an assumption that they can contribute to each other in clustering. Also, we propose a co-training approach for spectral clustering taking two features into account. Experiments on four real-world datasets show the feasibility and efficacy of our proposed approach compared with a number of the baseline methods.","PeriodicalId":6882,"journal":{"name":"2016 3rd International Conference on Information Science and Control Engineering (ICISCE)","volume":"148 1","pages":"38-43"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Fused Multi-feature Based Co-training Approach for Document Clustering\",\"authors\":\"Yuanqing Wang, Wenjun Wang, Weidi Dai, Pengfei Jiao, Wei Yu\",\"doi\":\"10.1109/ICISCE.2016.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Document clustering is a popular topic in data mining and information retrieval. Most models and methods for this problem are based on computing the similarity between pair documents modeled in a space of all terms, or a new feature space obtained by applying a topic modeling technique for a given corpus. In this paper, we regard these two ideas as clustering on term feature and on semantic feature, and have an assumption that they can contribute to each other in clustering. Also, we propose a co-training approach for spectral clustering taking two features into account. Experiments on four real-world datasets show the feasibility and efficacy of our proposed approach compared with a number of the baseline methods.\",\"PeriodicalId\":6882,\"journal\":{\"name\":\"2016 3rd International Conference on Information Science and Control Engineering (ICISCE)\",\"volume\":\"148 1\",\"pages\":\"38-43\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 3rd International Conference on Information Science and Control Engineering (ICISCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICISCE.2016.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 3rd International Conference on Information Science and Control Engineering (ICISCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISCE.2016.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Fused Multi-feature Based Co-training Approach for Document Clustering
Document clustering is a popular topic in data mining and information retrieval. Most models and methods for this problem are based on computing the similarity between pair documents modeled in a space of all terms, or a new feature space obtained by applying a topic modeling technique for a given corpus. In this paper, we regard these two ideas as clustering on term feature and on semantic feature, and have an assumption that they can contribute to each other in clustering. Also, we propose a co-training approach for spectral clustering taking two features into account. Experiments on four real-world datasets show the feasibility and efficacy of our proposed approach compared with a number of the baseline methods.