{"title":"Clustering What Matters in Constrained Settings","authors":"Ragesh Jaiswal, Amit Kumar","doi":"10.1007/s00453-025-01317-9","DOIUrl":null,"url":null,"abstract":"<div><p>Constrained clustering problems generalize classical clustering formulations, e.g., <span>\\(k\\)</span><span>-median</span>, <span>\\(k\\)</span><span>-means</span>, by imposing additional constraints on the feasibility of a clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out <i>m</i> points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained <span>\\(k\\)</span><span>-median</span> or <span>\\(k\\)</span><span>-means</span> problem to the corresponding outlier-free version with only <span>\\((1+\\varepsilon )\\)</span>-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to <span>\\(f(k,m, \\varepsilon )\\)</span> instances of the outlier-free version, where <span>\\(f(k, m, \\varepsilon ) = \\left( \\frac{k+m}{\\varepsilon }\\right) ^{O(m)}\\)</span>. As specific applications, we get the following results:</p><ul>\n <li>\n <p>First FPT (<i>in the parameters k and m</i>) <span>\\((1+\\varepsilon )\\)</span>-approximation algorithm for the outlier version of capacitated <span>\\(k\\)</span><span>-median</span> and <span>\\(k\\)</span><span>-means</span> in Euclidean spaces with <i>hard</i> capacities.</p>\n </li>\n <li>\n <p>First FPT (<i>in the parameters k and m</i>) <span>\\((3+\\varepsilon )\\)</span> and <span>\\((9+\\varepsilon )\\)</span> approximation algorithms for the outlier version of capacitated <span>\\(k\\)</span><span>-median</span> and <span>\\(k\\)</span><span>-means</span>, respectively, in general metric spaces with <i>hard</i> capacities.</p>\n </li>\n <li>\n <p>First FPT (<i>in the parameters k and m</i>) <span>\\((2-\\delta )\\)</span>-approximation algorithm for the outlier version of the <span>\\(k\\)</span><span>-median</span> problem under the Ulam metric.</p>\n </li>\n </ul><p> Our work generalizes the results of Bhattacharya et al. and Agrawal et al. to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.</p></div>","PeriodicalId":50824,"journal":{"name":"Algorithmica","volume":"87 8","pages":"1178 - 1198"},"PeriodicalIF":0.7000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s00453-025-01317-9.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithmica","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s00453-025-01317-9","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Constrained clustering problems generalize classical clustering formulations, e.g., \(k\)-median, \(k\)-means, by imposing additional constraints on the feasibility of a clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out m points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained \(k\)-median or \(k\)-means problem to the corresponding outlier-free version with only \((1+\varepsilon )\)-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to \(f(k,m, \varepsilon )\) instances of the outlier-free version, where \(f(k, m, \varepsilon ) = \left( \frac{k+m}{\varepsilon }\right) ^{O(m)}\). As specific applications, we get the following results:
First FPT (in the parameters k and m) \((1+\varepsilon )\)-approximation algorithm for the outlier version of capacitated \(k\)-median and \(k\)-means in Euclidean spaces with hard capacities.
First FPT (in the parameters k and m) \((3+\varepsilon )\) and \((9+\varepsilon )\) approximation algorithms for the outlier version of capacitated \(k\)-median and \(k\)-means, respectively, in general metric spaces with hard capacities.
First FPT (in the parameters k and m) \((2-\delta )\)-approximation algorithm for the outlier version of the \(k\)-median problem under the Ulam metric.
Our work generalizes the results of Bhattacharya et al. and Agrawal et al. to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.
期刊介绍:
Algorithmica is an international journal which publishes theoretical papers on algorithms that address problems arising in practical areas, and experimental papers of general appeal for practical importance or techniques. The development of algorithms is an integral part of computer science. The increasing complexity and scope of computer applications makes the design of efficient algorithms essential.
Algorithmica covers algorithms in applied areas such as: VLSI, distributed computing, parallel processing, automated design, robotics, graphics, data base design, software tools, as well as algorithms in fundamental areas such as sorting, searching, data structures, computational geometry, and linear programming.
In addition, the journal features two special sections: Application Experience, presenting findings obtained from applications of theoretical results to practical situations, and Problems, offering short papers presenting problems on selected topics of computer science.