{"title":"An Improved K_Means Algorithm for Document Clustering Based on Knowledge Graphs","authors":"Xiaoli Wang, Ying Li, Meihong Wang, Zixiang Yang, Huailin Dong","doi":"10.1109/CISP-BMEI.2018.8633187","DOIUrl":null,"url":null,"abstract":"K _means algorithm is one of the typical clustering algorithms in text mining tasks. K_means algorithm is widely used in many areas because of its easy to implement and ability to handle large datasets with better scalability. However, the random selection of initial cluster centroid in traditional K_means algorithm for text clustering easily leads to local optimization and instability of clustering results. Therefore, in order to overcome this shortcoming, this paper propose an improved K_means algorithm for document clustering which based on following two points: (i)we used concept distance to optimize the choice of the initial cluster centroid, which can avoid the drawbacks caused by random selection; (ii)we adopted knowledge graphs to improve traditional k_means text clustering algorithm by optimizing the calculation of text similarity. Theoretical analysis and experimental results show that the improved algorithm could optimize the accuracy of text clustering effectively.","PeriodicalId":117227,"journal":{"name":"2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"742 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI.2018.8633187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
K _means algorithm is one of the typical clustering algorithms in text mining tasks. K_means algorithm is widely used in many areas because of its easy to implement and ability to handle large datasets with better scalability. However, the random selection of initial cluster centroid in traditional K_means algorithm for text clustering easily leads to local optimization and instability of clustering results. Therefore, in order to overcome this shortcoming, this paper propose an improved K_means algorithm for document clustering which based on following two points: (i)we used concept distance to optimize the choice of the initial cluster centroid, which can avoid the drawbacks caused by random selection; (ii)we adopted knowledge graphs to improve traditional k_means text clustering algorithm by optimizing the calculation of text similarity. Theoretical analysis and experimental results show that the improved algorithm could optimize the accuracy of text clustering effectively.