{"title":"Relational data partitioning using evolutionary game theory","authors":"L. Hall, Alireza Chakeri","doi":"10.1109/CIDM.2014.7008656","DOIUrl":null,"url":null,"abstract":"This paper presents a new approach for relational data partitioning using the notion of dominant sets. A dominant set is a subset of data points satisfying the constraints of internal homogeneity and external in-homogeneity, i.e. a cluster. However, since any subset of a dominant set cannot be a dominant set itself, dominant sets tend to be compact sets. Hence, in this paper, we present a novel approach to enumerate well distributed clusters where the number of clusters need not be known. When the number of clusters is known, in order to search the solution space appropriately, after finding each dominant set, data points are partitioned into two disjoint subsets of data points using spectral graph image segmentation methods to enumerate the other well distributed dominant sets. For the latter case, we introduce a new hierarchical approach for relational data partitioning using a new class of evolutionary game theory dynamics called InImDynamics which is very fast and linear, in computational time, with the number of data points. In this regard, at each level of the proposed hierarchy, Dunn's index is used to find the appropriate number of clusters. Then the objects are partitioned based on the projected number of clusters using game theoretic relations. The same method is applied to each partition to extract its underlying structure. Although the resulting clusters exist in their equivalent partitions, they may not be clusters of the entire data. Hence, they are checked for being an actual cluster and if they are not, they are extended to an existing cluster of the data. The approach can also be used to assign unseen data to existing clusters, as well.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIDM.2014.7008656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
This paper presents a new approach for relational data partitioning using the notion of dominant sets. A dominant set is a subset of data points satisfying the constraints of internal homogeneity and external in-homogeneity, i.e. a cluster. However, since any subset of a dominant set cannot be a dominant set itself, dominant sets tend to be compact sets. Hence, in this paper, we present a novel approach to enumerate well distributed clusters where the number of clusters need not be known. When the number of clusters is known, in order to search the solution space appropriately, after finding each dominant set, data points are partitioned into two disjoint subsets of data points using spectral graph image segmentation methods to enumerate the other well distributed dominant sets. For the latter case, we introduce a new hierarchical approach for relational data partitioning using a new class of evolutionary game theory dynamics called InImDynamics which is very fast and linear, in computational time, with the number of data points. In this regard, at each level of the proposed hierarchy, Dunn's index is used to find the appropriate number of clusters. Then the objects are partitioned based on the projected number of clusters using game theoretic relations. The same method is applied to each partition to extract its underlying structure. Although the resulting clusters exist in their equivalent partitions, they may not be clusters of the entire data. Hence, they are checked for being an actual cluster and if they are not, they are extended to an existing cluster of the data. The approach can also be used to assign unseen data to existing clusters, as well.