{"title":"基于归一化方法的mk -均值聚类算法性能分析","authors":"Vaishali R. Patel, R. Mehta","doi":"10.1109/WICT.2011.6141380","DOIUrl":null,"url":null,"abstract":"Real world applications are increasingly growing in the field of science and engineering, where data mining is an important stage to relate research and applications. Data objects are clustered based on the similarity using unsupervised learning techniques. The incomplete, noisy and inconsistent data may slow down the knowledge discovery in database process. Data preprocessing techniques improve the quality of data, thereby helping to improve the accuracy and efficiency of the subsequent mining processes. Data cleaning is an important preprocessing task to avoid redundancies during data integration. Normalization is an additional data preprocessing task that would contribute towards the success of the data mining process. In normalization the data to be analyzed is scaled to a specific range. K-means is the well known partition based clustering algorithm, yet it suffers from shortcomings of passing number of clusters and initial centroids preliminary. This paper proposes modified K-means algorithm (MK-means) which provides a solution for automatic initialization of centroids and analyzes the performance of MK-means algorithm with integration of cleaning method and normalization techniques which shows the improvement in the performance of MK-means algorithm.","PeriodicalId":178645,"journal":{"name":"2011 World Congress on Information and Communication Technologies","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Performance analysis of MK-means clustering algorithm with normalization approach\",\"authors\":\"Vaishali R. Patel, R. Mehta\",\"doi\":\"10.1109/WICT.2011.6141380\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real world applications are increasingly growing in the field of science and engineering, where data mining is an important stage to relate research and applications. Data objects are clustered based on the similarity using unsupervised learning techniques. The incomplete, noisy and inconsistent data may slow down the knowledge discovery in database process. Data preprocessing techniques improve the quality of data, thereby helping to improve the accuracy and efficiency of the subsequent mining processes. Data cleaning is an important preprocessing task to avoid redundancies during data integration. Normalization is an additional data preprocessing task that would contribute towards the success of the data mining process. In normalization the data to be analyzed is scaled to a specific range. K-means is the well known partition based clustering algorithm, yet it suffers from shortcomings of passing number of clusters and initial centroids preliminary. This paper proposes modified K-means algorithm (MK-means) which provides a solution for automatic initialization of centroids and analyzes the performance of MK-means algorithm with integration of cleaning method and normalization techniques which shows the improvement in the performance of MK-means algorithm.\",\"PeriodicalId\":178645,\"journal\":{\"name\":\"2011 World Congress on Information and Communication Technologies\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 World Congress on Information and Communication Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WICT.2011.6141380\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 World Congress on Information and Communication Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WICT.2011.6141380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance analysis of MK-means clustering algorithm with normalization approach
Real world applications are increasingly growing in the field of science and engineering, where data mining is an important stage to relate research and applications. Data objects are clustered based on the similarity using unsupervised learning techniques. The incomplete, noisy and inconsistent data may slow down the knowledge discovery in database process. Data preprocessing techniques improve the quality of data, thereby helping to improve the accuracy and efficiency of the subsequent mining processes. Data cleaning is an important preprocessing task to avoid redundancies during data integration. Normalization is an additional data preprocessing task that would contribute towards the success of the data mining process. In normalization the data to be analyzed is scaled to a specific range. K-means is the well known partition based clustering algorithm, yet it suffers from shortcomings of passing number of clusters and initial centroids preliminary. This paper proposes modified K-means algorithm (MK-means) which provides a solution for automatic initialization of centroids and analyzes the performance of MK-means algorithm with integration of cleaning method and normalization techniques which shows the improvement in the performance of MK-means algorithm.