Fréchet Mean and p-Mean on the Unit Circle: Decidability, Algorithm, and Applications to Clustering on the Flat Torus

F. Cazals, Bernard Delmas, Timothée O'Donnell
{"title":"Fréchet Mean and p-Mean on the Unit Circle: Decidability, Algorithm, and Applications to Clustering on the Flat Torus","authors":"F. Cazals, Bernard Delmas, Timothée O'Donnell","doi":"10.4230/LIPIcs.SEA.2021.15","DOIUrl":null,"url":null,"abstract":"The center of mass of a point set lying on a manifold generalizes the celebrated Euclidean centroid, and is ubiquitous in statistical analysis in non Euclidean spaces. In this work, we give a complete characterization of the weighted p-mean of a finite set of angular values on S1, based on a decomposition of S1 such that the functional of interest has at most one local minimum per cell. This characterization is used to show that the problem is decidable for rational angular values –a consequence of Lindemann’s theorem on the transcendence of π, and to develop an effective algorithm parameterized by exact predicates. A robust implementation of this algorithm based on multi-precision interval arithmetic is also presented, and is shown to be effective for large values of n and p. We use it as building block to implement the k-means and k-means++ clustering algorithms on the flat torus, with applications to clustering protein molecular conformations. These algorithms are available in the Structural Bioinformatics Library (http://sbl.inria.fr). Our derivations are of interest in two respects. First, efficient p-mean calculations are relevant to develop principal components analysis on the flat torus encoding angular spaces–a particularly important case to describe molecular conformations. Second, our two-stage strategy stresses the interest of combinatorial methods for p-means, also emphasizing the role of numerical issues. 2012 ACM Subject Classification Theory of computation → Computational geometry","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"1 1","pages":"15:1-15:16"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of the Society of Sea Water Science, Japan","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.SEA.2021.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The center of mass of a point set lying on a manifold generalizes the celebrated Euclidean centroid, and is ubiquitous in statistical analysis in non Euclidean spaces. In this work, we give a complete characterization of the weighted p-mean of a finite set of angular values on S1, based on a decomposition of S1 such that the functional of interest has at most one local minimum per cell. This characterization is used to show that the problem is decidable for rational angular values –a consequence of Lindemann’s theorem on the transcendence of π, and to develop an effective algorithm parameterized by exact predicates. A robust implementation of this algorithm based on multi-precision interval arithmetic is also presented, and is shown to be effective for large values of n and p. We use it as building block to implement the k-means and k-means++ clustering algorithms on the flat torus, with applications to clustering protein molecular conformations. These algorithms are available in the Structural Bioinformatics Library (http://sbl.inria.fr). Our derivations are of interest in two respects. First, efficient p-mean calculations are relevant to develop principal components analysis on the flat torus encoding angular spaces–a particularly important case to describe molecular conformations. Second, our two-stage strategy stresses the interest of combinatorial methods for p-means, also emphasizing the role of numerical issues. 2012 ACM Subject Classification Theory of computation → Computational geometry
单位圆上的fr均值和p均值:可判定性、算法及其在平面环面聚类中的应用
流形上点集的质心推广了著名的欧几里得质心,在非欧几里得空间的统计分析中无处不在。在这项工作中,我们给出了S1上有限角值集的加权p均值的完整表征,基于S1的分解,使得感兴趣的函数在每个单元中最多有一个局部最小值。利用这一特性证明了问题对于有理角值是可决定的——这是Lindemann关于π的超越定理的一个结果,并发展了一个用精确谓词参数化的有效算法。本文还提出了一种基于多精度区间算法的鲁棒实现算法,并证明该算法对n和p的大值有效。我们将其作为构建块在平面环面上实现k-means和k-means++聚类算法,并将其应用于蛋白质分子构象的聚类。这些算法可在结构生物信息学库(http://sbl.inria.fr)中获得。我们的推导在两个方面有意义。首先,有效的p均值计算与在编码角空间的平面环面上开发主成分分析有关,这是描述分子构象的一个特别重要的情况。其次,我们的两阶段策略强调了p均值组合方法的兴趣,也强调了数值问题的作用。2012 ACM学科分类:计算理论→计算几何
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信