Approximating Metric Magnitude of Point Sets

Rayna Andreeva, James Ward, Primoz Skraba, Jie Gao, Rik Sarkar
{"title":"Approximating Metric Magnitude of Point Sets","authors":"Rayna Andreeva, James Ward, Primoz Skraba, Jie Gao, Rik Sarkar","doi":"arxiv-2409.04411","DOIUrl":null,"url":null,"abstract":"Metric magnitude is a measure of the \"size\" of point clouds with many\ndesirable geometric properties. It has been adapted to various mathematical\ncontexts and recent work suggests that it can enhance machine learning and\noptimization algorithms. But its usability is limited due to the computational\ncost when the dataset is large or when the computation must be carried out\nrepeatedly (e.g. in model training). In this paper, we study the magnitude\ncomputation problem, and show efficient ways of approximating it. We show that\nit can be cast as a convex optimization problem, but not as a submodular\noptimization. The paper describes two new algorithms - an iterative\napproximation algorithm that converges fast and is accurate, and a subset\nselection method that makes the computation even faster. It has been previously\nproposed that magnitude of model sequences generated during stochastic gradient\ndescent is correlated to generalization gap. Extension of this result using our\nmore scalable algorithms shows that longer sequences in fact bear higher\ncorrelations. We also describe new applications of magnitude in machine\nlearning - as an effective regularizer for neural network training, and as a\nnovel clustering criterion.","PeriodicalId":501444,"journal":{"name":"arXiv - MATH - Metric Geometry","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Metric Geometry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Metric magnitude is a measure of the "size" of point clouds with many desirable geometric properties. It has been adapted to various mathematical contexts and recent work suggests that it can enhance machine learning and optimization algorithms. But its usability is limited due to the computational cost when the dataset is large or when the computation must be carried out repeatedly (e.g. in model training). In this paper, we study the magnitude computation problem, and show efficient ways of approximating it. We show that it can be cast as a convex optimization problem, but not as a submodular optimization. The paper describes two new algorithms - an iterative approximation algorithm that converges fast and is accurate, and a subset selection method that makes the computation even faster. It has been previously proposed that magnitude of model sequences generated during stochastic gradient descent is correlated to generalization gap. Extension of this result using our more scalable algorithms shows that longer sequences in fact bear higher correlations. We also describe new applications of magnitude in machine learning - as an effective regularizer for neural network training, and as a novel clustering criterion.
近似点集的度量大小
度量大小是对具有多种理想几何特性的点云 "大小 "的一种度量。它已被应用于各种数学环境,最近的研究表明,它可以增强机器学习和优化算法。但是,当数据集较大或计算必须重复进行(如在模型训练中)时,由于计算成本问题,它的可用性受到了限制。在本文中,我们研究了幅度计算问题,并展示了近似该问题的有效方法。我们证明,它可以被视为一个凸优化问题,但不是一个子模块优化问题。论文介绍了两种新算法--一种收敛速度快且准确的迭代逼近算法,以及一种使计算速度更快的子集选择方法。之前有人提出,随机梯度下降过程中生成的模型序列的大小与泛化差距相关。使用我们更具可扩展性的算法对这一结果进行扩展后发现,较长的序列实际上具有较高的相关性。我们还介绍了幅值在机器学习中的新应用--作为神经网络训练的有效正则,以及作为一种高级聚类标准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信