{"title":"基于约束约简的关联聚类可扩展性","authors":"Mamata Samal, V. Saradhi, Sukumar Nandi","doi":"10.1145/2567688.2567695","DOIUrl":null,"url":null,"abstract":"Correlation clustering (CC) is a graph based clustering method. Edges of the graph are labeled either positive or negative depending on the similarity/dissimilarity between the pair of vertices. The objective of CC is to group vertices of the induced complete graph so as to maximize the positively labeled edges that lie within a group and to maximize negatively labeled edges that lie across groups. This objective function is formulated as a semidefinite programming (SDP) problem which is well studied theoretically producing encouraging approximation values. In this work we propose a scalable solution for the SDP formulation of correlation clustering (SDP-CC) by reducing the number of constraints. The proposed formulation is solved efficiently using SDP-NAL tool. The proposed scalable formulation is compared with other scalable variants namely variable reduction based CC. Experimental results on synthetic, real world data sets whose graph sizes range from 100 vertices to 13000 vertices are tested with both the scalable formulations. Large scale bench mark graph data sets are also tested whose sizes range from 2395 vertices to 13992 vertices. The proposed formulation is shown to have an edge over the original SDP-CC formulation, variable reduction variant of SDP-CC and a constraint clustering method, namely constrained spectral clustering.","PeriodicalId":253386,"journal":{"name":"Proceedings of the 1st IKDD Conference on Data Sciences","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalability of Correlation Clustering Through Constraint Reduction\",\"authors\":\"Mamata Samal, V. Saradhi, Sukumar Nandi\",\"doi\":\"10.1145/2567688.2567695\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Correlation clustering (CC) is a graph based clustering method. Edges of the graph are labeled either positive or negative depending on the similarity/dissimilarity between the pair of vertices. The objective of CC is to group vertices of the induced complete graph so as to maximize the positively labeled edges that lie within a group and to maximize negatively labeled edges that lie across groups. This objective function is formulated as a semidefinite programming (SDP) problem which is well studied theoretically producing encouraging approximation values. In this work we propose a scalable solution for the SDP formulation of correlation clustering (SDP-CC) by reducing the number of constraints. The proposed formulation is solved efficiently using SDP-NAL tool. The proposed scalable formulation is compared with other scalable variants namely variable reduction based CC. Experimental results on synthetic, real world data sets whose graph sizes range from 100 vertices to 13000 vertices are tested with both the scalable formulations. Large scale bench mark graph data sets are also tested whose sizes range from 2395 vertices to 13992 vertices. The proposed formulation is shown to have an edge over the original SDP-CC formulation, variable reduction variant of SDP-CC and a constraint clustering method, namely constrained spectral clustering.\",\"PeriodicalId\":253386,\"journal\":{\"name\":\"Proceedings of the 1st IKDD Conference on Data Sciences\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st IKDD Conference on Data Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2567688.2567695\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st IKDD Conference on Data Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2567688.2567695","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Scalability of Correlation Clustering Through Constraint Reduction
Correlation clustering (CC) is a graph based clustering method. Edges of the graph are labeled either positive or negative depending on the similarity/dissimilarity between the pair of vertices. The objective of CC is to group vertices of the induced complete graph so as to maximize the positively labeled edges that lie within a group and to maximize negatively labeled edges that lie across groups. This objective function is formulated as a semidefinite programming (SDP) problem which is well studied theoretically producing encouraging approximation values. In this work we propose a scalable solution for the SDP formulation of correlation clustering (SDP-CC) by reducing the number of constraints. The proposed formulation is solved efficiently using SDP-NAL tool. The proposed scalable formulation is compared with other scalable variants namely variable reduction based CC. Experimental results on synthetic, real world data sets whose graph sizes range from 100 vertices to 13000 vertices are tested with both the scalable formulations. Large scale bench mark graph data sets are also tested whose sizes range from 2395 vertices to 13992 vertices. The proposed formulation is shown to have an edge over the original SDP-CC formulation, variable reduction variant of SDP-CC and a constraint clustering method, namely constrained spectral clustering.