Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI:10.1145/2783258.2783364

A. Kabán

{"title":"Improved Bounds on the Dot Product under Random Projection and Random Sign Projection","authors":"A. Kabán","doi":"10.1145/2783258.2783364","DOIUrl":null,"url":null,"abstract":"Dot product is a key building block in a number of data mining algorithms from classification, regression, correlation clustering, to information retrieval and many others. When data is high dimensional, the use of random projections may serve as a universal dimensionality reduction method that provides both low distortion guarantees and computational savings. Yet, contrary to the optimal guarantees that are known on the preservation of the Euclidean distance cf. the Johnson-Lindenstrauss lemma, the existing guarantees on the dot product under random projection are loose and incomplete in the current data mining and machine learning literature. Some recent literature even suggested that the dot product may not be preserved when the angle between the original vectors is obtuse. In this paper we provide improved bounds on the dot product under random projection that matches the optimal bounds on the Euclidean distance. As a corollary, we elucidate the impact of the angle between the original vectors on the relative distortion of the dot product under random projection, and we show that the obtuse vs. acute angles behave symmetrically in the same way. In a further corollary we make a link to sign random projection, where we generalise earlier results. Numerical simulations confirm our theoretical results. Finally we give an application of our results to bounding the generalisation error of compressive linear classifiers under the margin loss.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2783258.2783364","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

Abstract

Dot product is a key building block in a number of data mining algorithms from classification, regression, correlation clustering, to information retrieval and many others. When data is high dimensional, the use of random projections may serve as a universal dimensionality reduction method that provides both low distortion guarantees and computational savings. Yet, contrary to the optimal guarantees that are known on the preservation of the Euclidean distance cf. the Johnson-Lindenstrauss lemma, the existing guarantees on the dot product under random projection are loose and incomplete in the current data mining and machine learning literature. Some recent literature even suggested that the dot product may not be preserved when the angle between the original vectors is obtuse. In this paper we provide improved bounds on the dot product under random projection that matches the optimal bounds on the Euclidean distance. As a corollary, we elucidate the impact of the angle between the original vectors on the relative distortion of the dot product under random projection, and we show that the obtuse vs. acute angles behave symmetrically in the same way. In a further corollary we make a link to sign random projection, where we generalise earlier results. Numerical simulations confirm our theoretical results. Finally we give an application of our results to bounding the generalisation error of compressive linear classifiers under the margin loss.

查看原文本刊更多论文

改进随机投影和随机符号投影下点积的界

从分类、回归、相关聚类到信息检索等许多数据挖掘算法中，点积是一个关键的构建块。当数据是高维时，使用随机投影可以作为一种通用的降维方法，既提供低失真保证又节省计算量。然而，与已知的欧氏距离(如Johnson-Lindenstrauss引理)的最佳保证相反，在当前的数据挖掘和机器学习文献中，对随机投影下的点积的现有保证是松散和不完整的。最近的一些文献甚至提出，当原始向量之间的夹角为钝角时，点积可能无法保留。本文给出了随机投影下点积的改进边界，该边界与欧氏距离的最优边界相匹配。作为推论，我们阐明了原始向量之间的角度对随机投影下点积的相对畸变的影响，并且我们表明，钝角与锐角以同样的方式对称地表现。在进一步的推论中，我们建立了一个与签名随机投影的联系，在那里我们推广了之前的结果。数值模拟证实了我们的理论结果。最后给出了在边际损失下压缩线性分类器泛化误差边界的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

自引率

0.00%

发文量