{"title":"一种改进的有能力相关聚类问题逼近算法","authors":"Sai Ji, Yukun Cheng, Jingjing Tan, Zhongrui Zhao","doi":"10.1142/s0129054123410010","DOIUrl":null,"url":null,"abstract":"Correlation clustering problem (CorCP) is a classical clustering problem, which clusters data based on the similarity of data set, and has many applications in interaction networks, cross-lingual link detection, and communication networks, etc. In this paper, we study a practical generalization of the CorCP, called the capacitated correlation clustering problem (the capacitated CorCP), by constructing a labeled complete graph. On this labeled complete graph, each vertex represents a piece of data. If two pieces of data are similar, then the edge between the corresponding vertices is marked by a positive label [Formula: see text]. Otherwise, this edge is marked by a negative label −. The objective of the capacitated CorCP is to group some similar data sets into one cluster as far as possible, while satisfying the cluster capacity constraint. To achieve this objective, we shall partition the vertex set of the labeled complete graph into several clusters, each cluster’s size subjecting to an upper bound, so as to minimize the number of disagreements. Here the number of disagreements is defined as the total number of the edges with positive labels between clusters and the edges with negative labels within clusters. Different with the previous algorithm in [18], which subjects to the constraint on the cluster size by a penalty measure, we design an algorithm for the capacitated CorCP to directly output a feasible solution by iteratively constructing clusters based on a preset threshold. Through carefully setting the threshold and sophisticatedly analyzing, our algorithm is proved to have an improved approximation ratio of 5.37. In addition, we also conduct a series of numerical experiments to demonstrate the effectiveness of our algorithm.","PeriodicalId":50323,"journal":{"name":"International Journal of Foundations of Computer Science","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Improved Approximation Algorithm for the Capacitated Correlation Clustering Problem\",\"authors\":\"Sai Ji, Yukun Cheng, Jingjing Tan, Zhongrui Zhao\",\"doi\":\"10.1142/s0129054123410010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Correlation clustering problem (CorCP) is a classical clustering problem, which clusters data based on the similarity of data set, and has many applications in interaction networks, cross-lingual link detection, and communication networks, etc. In this paper, we study a practical generalization of the CorCP, called the capacitated correlation clustering problem (the capacitated CorCP), by constructing a labeled complete graph. On this labeled complete graph, each vertex represents a piece of data. If two pieces of data are similar, then the edge between the corresponding vertices is marked by a positive label [Formula: see text]. Otherwise, this edge is marked by a negative label −. The objective of the capacitated CorCP is to group some similar data sets into one cluster as far as possible, while satisfying the cluster capacity constraint. To achieve this objective, we shall partition the vertex set of the labeled complete graph into several clusters, each cluster’s size subjecting to an upper bound, so as to minimize the number of disagreements. Here the number of disagreements is defined as the total number of the edges with positive labels between clusters and the edges with negative labels within clusters. Different with the previous algorithm in [18], which subjects to the constraint on the cluster size by a penalty measure, we design an algorithm for the capacitated CorCP to directly output a feasible solution by iteratively constructing clusters based on a preset threshold. Through carefully setting the threshold and sophisticatedly analyzing, our algorithm is proved to have an improved approximation ratio of 5.37. In addition, we also conduct a series of numerical experiments to demonstrate the effectiveness of our algorithm.\",\"PeriodicalId\":50323,\"journal\":{\"name\":\"International Journal of Foundations of Computer Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2023-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Foundations of Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0129054123410010\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0129054123410010","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
An Improved Approximation Algorithm for the Capacitated Correlation Clustering Problem
Correlation clustering problem (CorCP) is a classical clustering problem, which clusters data based on the similarity of data set, and has many applications in interaction networks, cross-lingual link detection, and communication networks, etc. In this paper, we study a practical generalization of the CorCP, called the capacitated correlation clustering problem (the capacitated CorCP), by constructing a labeled complete graph. On this labeled complete graph, each vertex represents a piece of data. If two pieces of data are similar, then the edge between the corresponding vertices is marked by a positive label [Formula: see text]. Otherwise, this edge is marked by a negative label −. The objective of the capacitated CorCP is to group some similar data sets into one cluster as far as possible, while satisfying the cluster capacity constraint. To achieve this objective, we shall partition the vertex set of the labeled complete graph into several clusters, each cluster’s size subjecting to an upper bound, so as to minimize the number of disagreements. Here the number of disagreements is defined as the total number of the edges with positive labels between clusters and the edges with negative labels within clusters. Different with the previous algorithm in [18], which subjects to the constraint on the cluster size by a penalty measure, we design an algorithm for the capacitated CorCP to directly output a feasible solution by iteratively constructing clusters based on a preset threshold. Through carefully setting the threshold and sophisticatedly analyzing, our algorithm is proved to have an improved approximation ratio of 5.37. In addition, we also conduct a series of numerical experiments to demonstrate the effectiveness of our algorithm.
期刊介绍:
The International Journal of Foundations of Computer Science is a bimonthly journal that publishes articles which contribute new theoretical results in all areas of the foundations of computer science. The theoretical and mathematical aspects covered include:
- Algebraic theory of computing and formal systems
- Algorithm and system implementation issues
- Approximation, probabilistic, and randomized algorithms
- Automata and formal languages
- Automated deduction
- Combinatorics and graph theory
- Complexity theory
- Computational biology and bioinformatics
- Cryptography
- Database theory
- Data structures
- Design and analysis of algorithms
- DNA computing
- Foundations of computer security
- Foundations of high-performance computing