聚类问题的精确指数算法

F. Fomin, P. Golovach, Tanmay Inamdar, Nidhi Purohit, Saket Saurabh
{"title":"聚类问题的精确指数算法","authors":"F. Fomin, P. Golovach, Tanmay Inamdar, Nidhi Purohit, Saket Saurabh","doi":"10.48550/arXiv.2208.06847","DOIUrl":null,"url":null,"abstract":"In this paper we initiate a systematic study of exact algorithms for well-known clustering problems, namely $k$-Median and $k$-Means. In $k$-Median, the input consists of a set $X$ of $n$ points belonging to a metric space, and the task is to select a subset $C \\subseteq X$ of $k$ points as centers, such that the sum of the distances of every point to its nearest center is minimized. In $k$-Means, the objective is to minimize the sum of squares of the distances instead. It is easy to design an algorithm running in time $\\max_{k\\leq n} {n \\choose k} n^{O(1)} = O^*(2^n)$ ($O^*(\\cdot)$ notation hides polynomial factors in $n$). We design first non-trivial exact algorithms for these problems. In particular, we obtain an $O^*((1.89)^n)$ time exact algorithm for $k$-Median that works for any value of $k$. Our algorithm is quite general in that it does not use any properties of the underlying (metric) space -- it does not even require the distances to satisfy the triangle inequality. In particular, the same algorithm also works for $k$-Means. We complement this result by showing that the running time of our algorithm is asymptotically optimal, up to the base of the exponent. That is, unless ETH fails, there is no algorithm for these problems running in time $2^{o(n)} \\cdot n^{O(1)}$. Finally, we consider the\"supplier\"versions of these clustering problems, where, in addition to the set $X$ we are additionally given a set of $m$ candidate centers $F$, and objective is to find a subset of $k$ centers from $F$. The goal is still to minimize the $k$-Median/$k$-Means/$k$-Center objective. For these versions we give a $O(2^n (mn)^{O(1)})$ time algorithms using subset convolution. We complement this result by showing that, under the Set Cover Conjecture, the supplier versions of these problems do not admit an exact algorithm running in time $2^{(1-\\epsilon) n} (mn)^{O(1)}$.","PeriodicalId":137775,"journal":{"name":"International Symposium on Parameterized and Exact Computation","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exact Exponential Algorithms for Clustering Problems\",\"authors\":\"F. Fomin, P. Golovach, Tanmay Inamdar, Nidhi Purohit, Saket Saurabh\",\"doi\":\"10.48550/arXiv.2208.06847\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we initiate a systematic study of exact algorithms for well-known clustering problems, namely $k$-Median and $k$-Means. In $k$-Median, the input consists of a set $X$ of $n$ points belonging to a metric space, and the task is to select a subset $C \\\\subseteq X$ of $k$ points as centers, such that the sum of the distances of every point to its nearest center is minimized. In $k$-Means, the objective is to minimize the sum of squares of the distances instead. It is easy to design an algorithm running in time $\\\\max_{k\\\\leq n} {n \\\\choose k} n^{O(1)} = O^*(2^n)$ ($O^*(\\\\cdot)$ notation hides polynomial factors in $n$). We design first non-trivial exact algorithms for these problems. In particular, we obtain an $O^*((1.89)^n)$ time exact algorithm for $k$-Median that works for any value of $k$. Our algorithm is quite general in that it does not use any properties of the underlying (metric) space -- it does not even require the distances to satisfy the triangle inequality. In particular, the same algorithm also works for $k$-Means. We complement this result by showing that the running time of our algorithm is asymptotically optimal, up to the base of the exponent. That is, unless ETH fails, there is no algorithm for these problems running in time $2^{o(n)} \\\\cdot n^{O(1)}$. Finally, we consider the\\\"supplier\\\"versions of these clustering problems, where, in addition to the set $X$ we are additionally given a set of $m$ candidate centers $F$, and objective is to find a subset of $k$ centers from $F$. The goal is still to minimize the $k$-Median/$k$-Means/$k$-Center objective. For these versions we give a $O(2^n (mn)^{O(1)})$ time algorithms using subset convolution. We complement this result by showing that, under the Set Cover Conjecture, the supplier versions of these problems do not admit an exact algorithm running in time $2^{(1-\\\\epsilon) n} (mn)^{O(1)}$.\",\"PeriodicalId\":137775,\"journal\":{\"name\":\"International Symposium on Parameterized and Exact Computation\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium on Parameterized and Exact Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2208.06847\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Parameterized and Exact Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2208.06847","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们系统地研究了众所周知的聚类问题的精确算法,即$k$ -Median和$k$ -Means。在$k$ -Median中,输入由一组$X$个属于度量空间的$n$个点组成,任务是选择$k$个点的一个子集$C \subseteq X$作为中心,使每个点到最近中心的距离之和最小。在$k$ -Means中,目标是最小化距离的平方和。设计一个实时运行的算法是很容易的$\max_{k\leq n} {n \choose k} n^{O(1)} = O^*(2^n)$ ($O^*(\cdot)$符号隐藏了$n$中的多项式因子)。我们首先为这些问题设计了非平凡精确算法。特别是,我们获得了$k$ -Median的$O^*((1.89)^n)$时间精确算法,该算法适用于任何$k$值。我们的算法非常通用,因为它不使用底层(度量)空间的任何性质——它甚至不需要距离来满足三角形不等式。特别地,同样的算法也适用于$k$ -Means。我们通过证明我们的算法的运行时间是渐近最优的来补充这个结果,直到指数的底。也就是说,除非ETH发生故障,否则没有算法可以及时运行$2^{o(n)} \cdot n^{O(1)}$。最后,我们考虑这些聚类问题的“供应商”版本,其中,除了集合$X$之外,我们还获得了一组$m$候选中心$F$,目标是从$F$中找到一个$k$中心的子集。目标仍然是最小化$k$ -Median/ $k$ -Means/ $k$ -Center目标。对于这些版本,我们给出了使用子集卷积的$O(2^n (mn)^{O(1)})$时间算法。我们通过表明,在集合覆盖猜想下,这些问题的供应商版本不承认及时运行的精确算法$2^{(1-\epsilon) n} (mn)^{O(1)}$来补充这一结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exact Exponential Algorithms for Clustering Problems
In this paper we initiate a systematic study of exact algorithms for well-known clustering problems, namely $k$-Median and $k$-Means. In $k$-Median, the input consists of a set $X$ of $n$ points belonging to a metric space, and the task is to select a subset $C \subseteq X$ of $k$ points as centers, such that the sum of the distances of every point to its nearest center is minimized. In $k$-Means, the objective is to minimize the sum of squares of the distances instead. It is easy to design an algorithm running in time $\max_{k\leq n} {n \choose k} n^{O(1)} = O^*(2^n)$ ($O^*(\cdot)$ notation hides polynomial factors in $n$). We design first non-trivial exact algorithms for these problems. In particular, we obtain an $O^*((1.89)^n)$ time exact algorithm for $k$-Median that works for any value of $k$. Our algorithm is quite general in that it does not use any properties of the underlying (metric) space -- it does not even require the distances to satisfy the triangle inequality. In particular, the same algorithm also works for $k$-Means. We complement this result by showing that the running time of our algorithm is asymptotically optimal, up to the base of the exponent. That is, unless ETH fails, there is no algorithm for these problems running in time $2^{o(n)} \cdot n^{O(1)}$. Finally, we consider the"supplier"versions of these clustering problems, where, in addition to the set $X$ we are additionally given a set of $m$ candidate centers $F$, and objective is to find a subset of $k$ centers from $F$. The goal is still to minimize the $k$-Median/$k$-Means/$k$-Center objective. For these versions we give a $O(2^n (mn)^{O(1)})$ time algorithms using subset convolution. We complement this result by showing that, under the Set Cover Conjecture, the supplier versions of these problems do not admit an exact algorithm running in time $2^{(1-\epsilon) n} (mn)^{O(1)}$.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信