{"title":"The Price of Hierarchical Clustering","authors":"Anna Arutyunova, Heiko Röglin","doi":"10.1007/s00453-025-01327-7","DOIUrl":null,"url":null,"abstract":"<div><p>Hierarchical Clustering is a popular tool for understanding the hereditary properties of a data set. Such a clustering is actually a sequence of clusterings that starts with the trivial clustering in which every data point forms its own cluster and then successively merges two existing clusters until all points are in the same cluster. A hierarchical clustering achieves an approximation factor of <span>\\(\\alpha \\)</span> if the costs of each <i>k</i>-clustering in the hierarchy are at most <span>\\(\\alpha \\)</span> times the costs of an optimal <i>k</i>-clustering. We study as cost functions the maximum (discrete) radius of any cluster (<i>k</i>-center problem) and the maximum diameter of any cluster (<i>k</i>-diameter problem). In general, the optimal clusterings do not form a hierarchy and hence an approximation factor of 1 cannot be achieved. We call the smallest approximation factor that can be achieved for any instance the <i>price of hierarchy</i>. For the <i>k</i>-diameter problem we improve the upper bound on the price of hierarchy to <span>\\(3+2\\sqrt{2}\\approx 5.83\\)</span>. Moreover we significantly improve the lower bounds for <i>k</i>-center and <i>k</i>-diameter, proving a price of hierarchy of exactly 4 and <span>\\(3+2\\sqrt{2}\\)</span>, respectively.</p></div>","PeriodicalId":50824,"journal":{"name":"Algorithmica","volume":"87 10","pages":"1420 - 1452"},"PeriodicalIF":0.7000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s00453-025-01327-7.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithmica","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s00453-025-01327-7","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Hierarchical Clustering is a popular tool for understanding the hereditary properties of a data set. Such a clustering is actually a sequence of clusterings that starts with the trivial clustering in which every data point forms its own cluster and then successively merges two existing clusters until all points are in the same cluster. A hierarchical clustering achieves an approximation factor of \(\alpha \) if the costs of each k-clustering in the hierarchy are at most \(\alpha \) times the costs of an optimal k-clustering. We study as cost functions the maximum (discrete) radius of any cluster (k-center problem) and the maximum diameter of any cluster (k-diameter problem). In general, the optimal clusterings do not form a hierarchy and hence an approximation factor of 1 cannot be achieved. We call the smallest approximation factor that can be achieved for any instance the price of hierarchy. For the k-diameter problem we improve the upper bound on the price of hierarchy to \(3+2\sqrt{2}\approx 5.83\). Moreover we significantly improve the lower bounds for k-center and k-diameter, proving a price of hierarchy of exactly 4 and \(3+2\sqrt{2}\), respectively.
期刊介绍:
Algorithmica is an international journal which publishes theoretical papers on algorithms that address problems arising in practical areas, and experimental papers of general appeal for practical importance or techniques. The development of algorithms is an integral part of computer science. The increasing complexity and scope of computer applications makes the design of efficient algorithms essential.
Algorithmica covers algorithms in applied areas such as: VLSI, distributed computing, parallel processing, automated design, robotics, graphics, data base design, software tools, as well as algorithms in fundamental areas such as sorting, searching, data structures, computational geometry, and linear programming.
In addition, the journal features two special sections: Application Experience, presenting findings obtained from applications of theoretical results to practical situations, and Problems, offering short papers presenting problems on selected topics of computer science.