Fast Manhattan sketches in data streams

Jelani Nelson, David P. Woodruff
{"title":"Fast Manhattan sketches in data streams","authors":"Jelani Nelson, David P. Woodruff","doi":"10.1145/1807085.1807101","DOIUrl":null,"url":null,"abstract":"The L1-distance, also known as the Manhattan or taxicab distance, between two vectors <i>x, y</i> in R<sup><i>n</i></sup> is ∑_{i=1}over<i>n</i> |<i>x<sub>i</sub>-y_<sub>i</sub></i>|. Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1-pass streaming algorithm for this problem in the turnstile model with <i>O</i>*(1/ε<sup>2</sup>) space and <i>O</i>*(1) update time. The <i>O</i>* notation hides polylogarithmic factors in ε, <i>n</i>, and the precision required to store vector entries. All previous algorithms either required Ω(1/ε<sup>3</sup>) space or Ω(1/ε<sup>2</sup>) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to <i>O</i>*(1) factors.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"69 1","pages":"99-110"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1807085.1807101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

Abstract

The L1-distance, also known as the Manhattan or taxicab distance, between two vectors x, y in Rn is ∑_{i=1}overn |xi-y_i|. Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1-pass streaming algorithm for this problem in the turnstile model with O*(1/ε2) space and O*(1) update time. The O* notation hides polylogarithmic factors in ε, n, and the precision required to store vector entries. All previous algorithms either required Ω(1/ε3) space or Ω(1/ε2) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to O*(1) factors.
数据流中的快速曼哈顿草图
在Rn中,两个向量x, y之间的l1距离,也称为曼哈顿或出租车距离,是∑_{i=1} / n |xi-y_i|。在大型数据库中,近似这个距离是一个基本的基本要素,应用程序可以用于聚类、最近邻搜索、网络监控、回归、采样和支持向量机。在空间为O*(1/ε2)、更新时间为O*(1)的转门模型中,给出了该问题的第一个1次流算法。O*符号隐藏了ε、n中的多对数因子,以及存储向量项所需的精度。所有先前的算法要么需要Ω(1/ε3)空间,要么需要Ω(1/ε2)更新时间,而且/或者不能在旋转门模型中工作(即,支持对每个坐标进行任意数量的更新)。我们的边界在0 *(1)个因子范围内是最优的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.40
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信