An Improved Approximation Algorithm for the Capacitated Correlation Clustering Problem

IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, THEORY & METHODS
Sai Ji, Yukun Cheng, Jingjing Tan, Zhongrui Zhao
{"title":"An Improved Approximation Algorithm for the Capacitated Correlation Clustering Problem","authors":"Sai Ji, Yukun Cheng, Jingjing Tan, Zhongrui Zhao","doi":"10.1142/s0129054123410010","DOIUrl":null,"url":null,"abstract":"Correlation clustering problem (CorCP) is a classical clustering problem, which clusters data based on the similarity of data set, and has many applications in interaction networks, cross-lingual link detection, and communication networks, etc. In this paper, we study a practical generalization of the CorCP, called the capacitated correlation clustering problem (the capacitated CorCP), by constructing a labeled complete graph. On this labeled complete graph, each vertex represents a piece of data. If two pieces of data are similar, then the edge between the corresponding vertices is marked by a positive label [Formula: see text]. Otherwise, this edge is marked by a negative label −. The objective of the capacitated CorCP is to group some similar data sets into one cluster as far as possible, while satisfying the cluster capacity constraint. To achieve this objective, we shall partition the vertex set of the labeled complete graph into several clusters, each cluster’s size subjecting to an upper bound, so as to minimize the number of disagreements. Here the number of disagreements is defined as the total number of the edges with positive labels between clusters and the edges with negative labels within clusters. Different with the previous algorithm in [18], which subjects to the constraint on the cluster size by a penalty measure, we design an algorithm for the capacitated CorCP to directly output a feasible solution by iteratively constructing clusters based on a preset threshold. Through carefully setting the threshold and sophisticatedly analyzing, our algorithm is proved to have an improved approximation ratio of 5.37. In addition, we also conduct a series of numerical experiments to demonstrate the effectiveness of our algorithm.","PeriodicalId":50323,"journal":{"name":"International Journal of Foundations of Computer Science","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0129054123410010","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Correlation clustering problem (CorCP) is a classical clustering problem, which clusters data based on the similarity of data set, and has many applications in interaction networks, cross-lingual link detection, and communication networks, etc. In this paper, we study a practical generalization of the CorCP, called the capacitated correlation clustering problem (the capacitated CorCP), by constructing a labeled complete graph. On this labeled complete graph, each vertex represents a piece of data. If two pieces of data are similar, then the edge between the corresponding vertices is marked by a positive label [Formula: see text]. Otherwise, this edge is marked by a negative label −. The objective of the capacitated CorCP is to group some similar data sets into one cluster as far as possible, while satisfying the cluster capacity constraint. To achieve this objective, we shall partition the vertex set of the labeled complete graph into several clusters, each cluster’s size subjecting to an upper bound, so as to minimize the number of disagreements. Here the number of disagreements is defined as the total number of the edges with positive labels between clusters and the edges with negative labels within clusters. Different with the previous algorithm in [18], which subjects to the constraint on the cluster size by a penalty measure, we design an algorithm for the capacitated CorCP to directly output a feasible solution by iteratively constructing clusters based on a preset threshold. Through carefully setting the threshold and sophisticatedly analyzing, our algorithm is proved to have an improved approximation ratio of 5.37. In addition, we also conduct a series of numerical experiments to demonstrate the effectiveness of our algorithm.
一种改进的有能力相关聚类问题逼近算法
关联聚类问题(CorCP)是一种基于数据集相似性对数据进行聚类的经典聚类问题,在交互网络、跨语言链接检测、通信网络等领域有着广泛的应用。本文通过构造一个标记完全图,研究了CorCP的一种实际推广,即有能力相关聚类问题(capacitated CorCP)。在这个带标签的完全图上,每个顶点代表一段数据。如果两组数据相似,则对应顶点之间的边用正标签标记[公式:见文]。否则,该边被标记为负的“−”。容量化CorCP的目标是在满足集群容量约束的情况下,尽可能地将一些相似的数据集集中到一个集群中。为了实现这一目标,我们将标记的完全图的顶点集划分为几个簇,每个簇的大小有一个上界,以最小化分歧的数量。这里的分歧数被定义为聚类之间带正标签的边和聚类内带负标签的边的总数。与先前[18]算法通过惩罚措施约束聚类大小不同,我们设计了一种算法,使有能力的CorCP根据预设阈值迭代构造聚类,直接输出可行解。通过对阈值的精心设置和细致的分析,我们的算法得到了5.37的改进近似比。此外,我们还进行了一系列的数值实验来验证算法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Foundations of Computer Science
International Journal of Foundations of Computer Science 工程技术-计算机:理论方法
CiteScore
1.60
自引率
12.50%
发文量
63
审稿时长
3 months
期刊介绍: The International Journal of Foundations of Computer Science is a bimonthly journal that publishes articles which contribute new theoretical results in all areas of the foundations of computer science. The theoretical and mathematical aspects covered include: - Algebraic theory of computing and formal systems - Algorithm and system implementation issues - Approximation, probabilistic, and randomized algorithms - Automata and formal languages - Automated deduction - Combinatorics and graph theory - Complexity theory - Computational biology and bioinformatics - Cryptography - Database theory - Data structures - Design and analysis of algorithms - DNA computing - Foundations of computer security - Foundations of high-performance computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信