快速度量嵌入汉明立方体

IF 1.6 3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

SIAM Journal on Computing Pub Date : 2024-03-14 DOI:10.1137/22m1520220

Sjoerd Dirksen, Shahar Mendelson, Alexander Stollenwerk

{"title":"快速度量嵌入汉明立方体","authors":"Sjoerd Dirksen, Shahar Mendelson, Alexander Stollenwerk","doi":"10.1137/22m1520220","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Computing, Volume 53, Issue 2, Page 315-345, April 2024. <br/> Abstract. We consider the problem of embedding a subset of [math] into a low-dimensional Hamming cube in an almost isometric way. We construct a simple, data-oblivious, and computationally efficient map that achieves this task with high probability; we first apply a specific structured random matrix, which we call the double circulant matrix; using that a matrix requires linear storage and matrix-vector multiplication that can be performed in near-linear time. We then binarize each vector by comparing each of its entries to a random threshold, selected uniformly at random from a well-chosen interval. We estimate the number of bits required for this encoding scheme in terms of two natural geometric complexity parameters of the set: its Euclidean covering numbers and its localized Gaussian complexity. The estimate we derive turns out to be the best that one can hope for, up to logarithmic terms. The key to the proof is a phenomenon of independent interest: we show that the double circulant matrix mimics the behavior of the Gaussian matrix in two important ways. First, it maps an arbitrary set in [math] into a set of well-spread vectors. Second, it yields a fast near-isometric embedding of any finite subset of [math] into [math]. This embedding achieves the same dimension reduction as the Gaussian matrix in near-linear time, under an optimal condition—up to logarithmic factors—on the number of points to be embedded. This improves a well-known construction due to Ailon and Chazelle.","PeriodicalId":49532,"journal":{"name":"SIAM Journal on Computing","volume":"21 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast Metric Embedding into the Hamming Cube\",\"authors\":\"Sjoerd Dirksen, Shahar Mendelson, Alexander Stollenwerk\",\"doi\":\"10.1137/22m1520220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SIAM Journal on Computing, Volume 53, Issue 2, Page 315-345, April 2024. <br/> Abstract. We consider the problem of embedding a subset of [math] into a low-dimensional Hamming cube in an almost isometric way. We construct a simple, data-oblivious, and computationally efficient map that achieves this task with high probability; we first apply a specific structured random matrix, which we call the double circulant matrix; using that a matrix requires linear storage and matrix-vector multiplication that can be performed in near-linear time. We then binarize each vector by comparing each of its entries to a random threshold, selected uniformly at random from a well-chosen interval. We estimate the number of bits required for this encoding scheme in terms of two natural geometric complexity parameters of the set: its Euclidean covering numbers and its localized Gaussian complexity. The estimate we derive turns out to be the best that one can hope for, up to logarithmic terms. The key to the proof is a phenomenon of independent interest: we show that the double circulant matrix mimics the behavior of the Gaussian matrix in two important ways. First, it maps an arbitrary set in [math] into a set of well-spread vectors. Second, it yields a fast near-isometric embedding of any finite subset of [math] into [math]. This embedding achieves the same dimension reduction as the Gaussian matrix in near-linear time, under an optimal condition—up to logarithmic factors—on the number of points to be embedded. This improves a well-known construction due to Ailon and Chazelle.\",\"PeriodicalId\":49532,\"journal\":{\"name\":\"SIAM Journal on Computing\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIAM Journal on Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1137/22m1520220\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM Journal on Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1137/22m1520220","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

SIAM 计算期刊》，第 53 卷第 2 期，第 315-345 页，2024 年 4 月。摘要。我们考虑以几乎等距的方式将[math]的一个子集嵌入低维汉明立方体的问题。我们构建了一个简单、不依赖数据、计算效率高的映射，可以高概率地实现这一任务；我们首先应用一个特定的结构化随机矩阵，我们称之为双圆周矩阵；使用该矩阵需要线性存储和矩阵-矢量乘法，可以在接近线性的时间内完成。然后，我们将每个矢量的每个条目与随机阈值进行比较，对每个矢量进行二值化处理，随机阈值是从一个精心选择的区间中均匀随机选取的。我们根据集合的两个自然几何复杂度参数来估算这种编码方案所需的比特数：欧氏覆盖数和局部高斯复杂度。结果表明，我们得出的估算结果是最好的，甚至达到了对数。证明的关键在于一个独立的现象：我们证明了双圆周矩阵在两个重要方面模仿了高斯矩阵的行为。首先，它将[math]中的任意集合映射成一个分布良好的向量集合。其次，它能将[math]的任意有限子集快速近等距嵌入到[math]中。这种嵌入以接近线性的时间实现了与高斯矩阵相同的维度缩减，其最优条件是被嵌入点的数量达到对数因子。这改进了 Ailon 和 Chazelle 提出的著名构造。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fast Metric Embedding into the Hamming Cube

SIAM Journal on Computing, Volume 53, Issue 2, Page 315-345, April 2024.
Abstract. We consider the problem of embedding a subset of [math] into a low-dimensional Hamming cube in an almost isometric way. We construct a simple, data-oblivious, and computationally efficient map that achieves this task with high probability; we first apply a specific structured random matrix, which we call the double circulant matrix; using that a matrix requires linear storage and matrix-vector multiplication that can be performed in near-linear time. We then binarize each vector by comparing each of its entries to a random threshold, selected uniformly at random from a well-chosen interval. We estimate the number of bits required for this encoding scheme in terms of two natural geometric complexity parameters of the set: its Euclidean covering numbers and its localized Gaussian complexity. The estimate we derive turns out to be the best that one can hope for, up to logarithmic terms. The key to the proof is a phenomenon of independent interest: we show that the double circulant matrix mimics the behavior of the Gaussian matrix in two important ways. First, it maps an arbitrary set in [math] into a set of well-spread vectors. Second, it yields a fast near-isometric embedding of any finite subset of [math] into [math]. This embedding achieves the same dimension reduction as the Gaussian matrix in near-linear time, under an optimal condition—up to logarithmic factors—on the number of points to be embedded. This improves a well-known construction due to Ailon and Chazelle.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SIAM Journal on Computing 工程技术-计算机：理论方法

CiteScore

4.60

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The SIAM Journal on Computing aims to provide coverage of the most significant work going on in the mathematical and formal aspects of computer science and nonnumerical computing. Submissions must be clearly written and make a significant technical contribution. Topics include but are not limited to analysis and design of algorithms, algorithmic game theory, data structures, computational complexity, computational algebra, computational aspects of combinatorics and graph theory, computational biology, computational geometry, computational robotics, the mathematical aspects of programming languages, artificial intelligence, computational learning, databases, information retrieval, cryptography, networks, distributed computing, parallel algorithms, and computer architecture.