{"title":"一种基于遗传算法的混合和分类大数据集聚类算法","authors":"Li Jie, G. Xinbo, Jiao Li-cheng","doi":"10.1109/ICCIMA.2003.1238108","DOIUrl":null,"url":null,"abstract":"In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. Experimental result illustrates that the GA-based new clustering algorithm is feasible for the large data sets with mixed numeric and categorical values.","PeriodicalId":385362,"journal":{"name":"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"A GA-based clustering algorithm for large data sets with mixed and categorical values\",\"authors\":\"Li Jie, G. Xinbo, Jiao Li-cheng\",\"doi\":\"10.1109/ICCIMA.2003.1238108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. Experimental result illustrates that the GA-based new clustering algorithm is feasible for the large data sets with mixed numeric and categorical values.\",\"PeriodicalId\":385362,\"journal\":{\"name\":\"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIMA.2003.1238108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIMA.2003.1238108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A GA-based clustering algorithm for large data sets with mixed and categorical values
In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. Experimental result illustrates that the GA-based new clustering algorithm is feasible for the large data sets with mixed numeric and categorical values.