Clustering What Matters in Constrained Settings

IF 0.7 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Algorithmica Pub Date : 2025-05-08 DOI:10.1007/s00453-025-01317-9

Ragesh Jaiswal, Amit Kumar

{"title":"Clustering What Matters in Constrained Settings","authors":"Ragesh Jaiswal, Amit Kumar","doi":"10.1007/s00453-025-01317-9","DOIUrl":null,"url":null,"abstract":"<div>Constrained clustering problems generalize classical clustering formulations, e.g., \\(k\\)-median, \\(k\\)-means, by imposing additional constraints on the feasibility of a clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out m points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained \\(k\\)-median or \\(k\\)-means problem to the corresponding outlier-free version with only \\((1+\\varepsilon )\\)-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to \\(f(k,m, \\varepsilon )\\) instances of the outlier-free version, where \\(f(k, m, \\varepsilon ) = \\left( \\frac{k+m}{\\varepsilon }\\right) ^{O(m)}\\). As specific applications, we get the following results:<ul>\n <li>\n First FPT (in the parameters k and m) \\((1+\\varepsilon )\\)-approximation algorithm for the outlier version of capacitated \\(k\\)-median and \\(k\\)-means in Euclidean spaces with hard capacities.\n </li>\n <li>\n First FPT (in the parameters k and m) \\((3+\\varepsilon )\\) and \\((9+\\varepsilon )\\) approximation algorithms for the outlier version of capacitated \\(k\\)-median and \\(k\\)-means, respectively, in general metric spaces with hard capacities.\n </li>\n <li>\n First FPT (in the parameters k and m) \\((2-\\delta )\\)-approximation algorithm for the outlier version of the \\(k\\)-median problem under the Ulam metric.\n </li>\n </ul> Our work generalizes the results of Bhattacharya et al. and Agrawal et al. to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.</div>","PeriodicalId":50824,"journal":{"name":"Algorithmica","volume":"87 8","pages":"1178 - 1198"},"PeriodicalIF":0.7000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s00453-025-01317-9.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithmica","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s00453-025-01317-9","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Constrained clustering problems generalize classical clustering formulations, e.g., \(k\)-median, \(k\)-means, by imposing additional constraints on the feasibility of a clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out m points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained \(k\)-median or \(k\)-means problem to the corresponding outlier-free version with only \((1+\varepsilon )\)-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to \(f(k,m, \varepsilon )\) instances of the outlier-free version, where \(f(k, m, \varepsilon ) = \left( \frac{k+m}{\varepsilon }\right) ^{O(m)}\). As specific applications, we get the following results:

First FPT (in the parameters k and m) \((1+\varepsilon )\)-approximation algorithm for the outlier version of capacitated \(k\)-median and \(k\)-means in Euclidean spaces with hard capacities.
First FPT (in the parameters k and m) \((3+\varepsilon )\) and \((9+\varepsilon )\) approximation algorithms for the outlier version of capacitated \(k\)-median and \(k\)-means, respectively, in general metric spaces with hard capacities.
First FPT (in the parameters k and m) \((2-\delta )\)-approximation algorithm for the outlier version of the \(k\)-median problem under the Ulam metric.

Our work generalizes the results of Bhattacharya et al. and Agrawal et al. to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.

查看原文本刊更多论文

聚类在受限环境下的重要性

约束聚类问题通过对聚类的可行性施加额外的约束来推广经典的聚类公式，例如\(k\) -median, \(k\) -means。最近在获得这些问题的近似算法方面取得了重大进展，无论是在度量还是欧几里得设置中。然而，这些问题的离群值版本（允许解决方案从聚类中省略m个点）并没有得到很好的理解。在这项工作中，我们给出了一个一般框架，用于将约束\(k\) -中位数或\(k\) -均值问题的离群值版本减少到相应的无离群值版本，在近似比中只有\((1+\varepsilon )\) -损失。通过将问题的原始实例映射到无离群值版本的\(f(k,m, \varepsilon )\)实例来获得约简，其中\(f(k, m, \varepsilon ) = \left( \frac{k+m}{\varepsilon }\right) ^{O(m)}\)。作为具体应用，我们得到了以下结果：首先，FPT（在参数k和m中）\((1+\varepsilon )\) -在硬容量欧几里得空间中被容\(k\) -中位数和\(k\) -均值的离群值版本的近似算法。首先，FPT（在参数k和m中）\((3+\varepsilon )\)和\((9+\varepsilon )\)近似算法分别适用于具有硬容量的一般度量空间中的异常值版本\(k\) -中位数和\(k\) -均值。首先，FPT（在参数k和m中）\((2-\delta )\) -近似算法用于Ulam度量下\(k\) -中位数问题的离群值版本。我们的工作将Bhattacharya等人和Agrawal等人的结果推广到更大的一类约束聚类问题。此外，我们的约简适用于任意度量空间，因此可以扩展聚类算法在欧几里德和任意度量空间中的无离群值版本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Algorithmica 工程技术-计算机：软件工程

CiteScore

2.80

自引率

9.10%

发文量

158

审稿时长

12 months

期刊介绍： Algorithmica is an international journal which publishes theoretical papers on algorithms that address problems arising in practical areas, and experimental papers of general appeal for practical importance or techniques. The development of algorithms is an integral part of computer science. The increasing complexity and scope of computer applications makes the design of efficient algorithms essential. Algorithmica covers algorithms in applied areas such as: VLSI, distributed computing, parallel processing, automated design, robotics, graphics, data base design, software tools, as well as algorithms in fundamental areas such as sorting, searching, data structures, computational geometry, and linear programming. In addition, the journal features two special sections: Application Experience, presenting findings obtained from applications of theoretical results to practical situations, and Problems, offering short papers presenting problems on selected topics of computer science.