{"title":"基于迭代特征选择的微博帖子聚类算法建模","authors":"Kai Gao, Baoquan Zhang","doi":"10.1109/ICMIC.2014.7020768","DOIUrl":null,"url":null,"abstract":"With the coming of big data era, data mining and intelligent processing become more and more important, and modelling on novel intelligent processing is necessary. As micro-blog posts' properties on short texts, together with their linguistic unreliable features and the incompleteness of lexical, it is necessary to analyze and cluster these similar posts together for the further data mining and recommendation. This paper takes advantage of the classical clustering algorithm of k-means, and then presents a novel modelling approach to partition the big data into the corresponding k groups. Furthermore, a text feature selection model based on 2-phase iteration is proposed. Based on this model, a micro-blog post clustering algorithm is present. The proposed algorithm takes use of the partition idea and avoids the influence of noise data. Experiment shows the feasible of the proposed approach, and some existing problems and further works are also presented in the end.","PeriodicalId":405363,"journal":{"name":"Proceedings of 2014 International Conference on Modelling, Identification & Control","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Modelling on clustering algorithm based on iteration feature selection for micro-blog posts\",\"authors\":\"Kai Gao, Baoquan Zhang\",\"doi\":\"10.1109/ICMIC.2014.7020768\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the coming of big data era, data mining and intelligent processing become more and more important, and modelling on novel intelligent processing is necessary. As micro-blog posts' properties on short texts, together with their linguistic unreliable features and the incompleteness of lexical, it is necessary to analyze and cluster these similar posts together for the further data mining and recommendation. This paper takes advantage of the classical clustering algorithm of k-means, and then presents a novel modelling approach to partition the big data into the corresponding k groups. Furthermore, a text feature selection model based on 2-phase iteration is proposed. Based on this model, a micro-blog post clustering algorithm is present. The proposed algorithm takes use of the partition idea and avoids the influence of noise data. Experiment shows the feasible of the proposed approach, and some existing problems and further works are also presented in the end.\",\"PeriodicalId\":405363,\"journal\":{\"name\":\"Proceedings of 2014 International Conference on Modelling, Identification & Control\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 2014 International Conference on Modelling, Identification & Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMIC.2014.7020768\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2014 International Conference on Modelling, Identification & Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMIC.2014.7020768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Modelling on clustering algorithm based on iteration feature selection for micro-blog posts
With the coming of big data era, data mining and intelligent processing become more and more important, and modelling on novel intelligent processing is necessary. As micro-blog posts' properties on short texts, together with their linguistic unreliable features and the incompleteness of lexical, it is necessary to analyze and cluster these similar posts together for the further data mining and recommendation. This paper takes advantage of the classical clustering algorithm of k-means, and then presents a novel modelling approach to partition the big data into the corresponding k groups. Furthermore, a text feature selection model based on 2-phase iteration is proposed. Based on this model, a micro-blog post clustering algorithm is present. The proposed algorithm takes use of the partition idea and avoids the influence of noise data. Experiment shows the feasible of the proposed approach, and some existing problems and further works are also presented in the end.