{"title":"通过吸引和分散来聚集","authors":"J. Chongstitvatana, Wanwara Thubtimdang","doi":"10.1109/JCSSE.2011.5930149","DOIUrl":null,"url":null,"abstract":"Clustering is data analysis which aims to group similar objects together while separating them from dissimilar objects. Centroid-based clustering methods create clusters of objects in the shape of hyper-sphere, and thus cannot create clusters correctly when similar objects do not form a hyper-sphere. This work proposes an agglomerative clustering method using the concept of attraction and distraction. Attraction is measured by the number of similar object pairs in two clusters and the size of the two clusters. Distraction is the possibility that there are other possible cluster pairs to be merged. The proposed algorithm is evaluated against K-means algorithm, and it is found that it gives higher accuracy then K-means algorithm on iris and Haberman survival datasets, lower accuracy on breast cancer and SPECT heart test datasets, and comparable accuracy on wine dataset.","PeriodicalId":287775,"journal":{"name":"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Clustering by attraction and distraction\",\"authors\":\"J. Chongstitvatana, Wanwara Thubtimdang\",\"doi\":\"10.1109/JCSSE.2011.5930149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering is data analysis which aims to group similar objects together while separating them from dissimilar objects. Centroid-based clustering methods create clusters of objects in the shape of hyper-sphere, and thus cannot create clusters correctly when similar objects do not form a hyper-sphere. This work proposes an agglomerative clustering method using the concept of attraction and distraction. Attraction is measured by the number of similar object pairs in two clusters and the size of the two clusters. Distraction is the possibility that there are other possible cluster pairs to be merged. The proposed algorithm is evaluated against K-means algorithm, and it is found that it gives higher accuracy then K-means algorithm on iris and Haberman survival datasets, lower accuracy on breast cancer and SPECT heart test datasets, and comparable accuracy on wine dataset.\",\"PeriodicalId\":287775,\"journal\":{\"name\":\"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JCSSE.2011.5930149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2011.5930149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Clustering is data analysis which aims to group similar objects together while separating them from dissimilar objects. Centroid-based clustering methods create clusters of objects in the shape of hyper-sphere, and thus cannot create clusters correctly when similar objects do not form a hyper-sphere. This work proposes an agglomerative clustering method using the concept of attraction and distraction. Attraction is measured by the number of similar object pairs in two clusters and the size of the two clusters. Distraction is the possibility that there are other possible cluster pairs to be merged. The proposed algorithm is evaluated against K-means algorithm, and it is found that it gives higher accuracy then K-means algorithm on iris and Haberman survival datasets, lower accuracy on breast cancer and SPECT heart test datasets, and comparable accuracy on wine dataset.