Random Manhattan Indexing

2014 25th International Workshop on Database and Expert Systems Applications Pub Date : 2014-12-04 DOI:10.1109/DEXA.2014.51

B. Zadeh, S. Handschuh

引用次数: 10

Abstract

Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in text processing. In these models, high-dimensional, often sparse vectors represent text units. In an application, the similarity of vectors -- and hence the text units that they represent -- is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the performance of methods that employ VSMs. Consequently, a dimensionality reduction technique is employed to alleviate this problem. This paper introduces a new method, called Random Manhattan Indexing (RMI), for the construction of L1 normed VSMs at reduced dimensionality. RMI combines the construction of a VSM and dimension reduction into an incremental, and thus scalable, procedure. In order to attain its goal, RMI employs the sparse Cauchy random projections.

查看原文本刊更多论文

随机曼哈顿索引

向量空间模型(vsm)是在数学上定义良好的框架，已广泛应用于文本处理。在这些模型中，高维的、通常稀疏的向量表示文本单元。在应用程序中，向量的相似性——以及它们所代表的文本单位——是通过距离公式计算的。然而，向量的高维是使用向量向量模型的方法性能的一个障碍。因此，采用降维技术来缓解这一问题。本文介绍了一种构建L1规范降维vsm的新方法——随机曼哈顿索引(RMI)。RMI将VSM的构建和降维结合到一个增量的、可扩展的过程中。为了达到目标，RMI采用了稀疏柯西随机投影。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 25th International Workshop on Database and Expert Systems Applications

自引率

0.00%

发文量