{"title":"大型数据库数据挖掘的一种新的数据聚类方法","authors":"Cheng-Fa Tsai, Hang-Chang Wu, Chun-Wei Tsai","doi":"10.1109/ISPAN.2002.1004300","DOIUrl":null,"url":null,"abstract":"Clustering is the unsupervised classification of patterns (data item, feature vectors, or observations) into groups (clusters). Clustering in data mining is very useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric-based similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper, we present a new data clustering method for data mining in large databases. Our simulation results show that the proposed novel clustering method performs better than a fast self-organizing map (FSOM) combined with the k-means approach (FSOM+k-means) and the genetic k-means algorithm (GKA). In addition, in all the cases we studied, our method produces much smaller errors than both the FSOM+k-means approach and GKA.","PeriodicalId":255069,"journal":{"name":"Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'02","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":"{\"title\":\"A new data clustering approach for data mining in large databases\",\"authors\":\"Cheng-Fa Tsai, Hang-Chang Wu, Chun-Wei Tsai\",\"doi\":\"10.1109/ISPAN.2002.1004300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering is the unsupervised classification of patterns (data item, feature vectors, or observations) into groups (clusters). Clustering in data mining is very useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric-based similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper, we present a new data clustering method for data mining in large databases. Our simulation results show that the proposed novel clustering method performs better than a fast self-organizing map (FSOM) combined with the k-means approach (FSOM+k-means) and the genetic k-means algorithm (GKA). In addition, in all the cases we studied, our method produces much smaller errors than both the FSOM+k-means approach and GKA.\",\"PeriodicalId\":255069,\"journal\":{\"name\":\"Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'02\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"64\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'02\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPAN.2002.1004300\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'02","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPAN.2002.1004300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A new data clustering approach for data mining in large databases
Clustering is the unsupervised classification of patterns (data item, feature vectors, or observations) into groups (clusters). Clustering in data mining is very useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric-based similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper, we present a new data clustering method for data mining in large databases. Our simulation results show that the proposed novel clustering method performs better than a fast self-organizing map (FSOM) combined with the k-means approach (FSOM+k-means) and the genetic k-means algorithm (GKA). In addition, in all the cases we studied, our method produces much smaller errors than both the FSOM+k-means approach and GKA.