Sampling operations on big data

2015 49th Asilomar Conference on Signals, Systems and Computers Pub Date : 2015-11-29 DOI:10.1109/ACSSC.2015.7421398

V. Gadepally, Taylor Herr, Luke B. Johnson, Lauren Milechin, Maja Milosavljevic, B. A. Miller

引用次数: 9

Abstract

The 3Vs - Volume, Velocity and Variety - of Big Data continues to be a large challenge for systems and algorithms designed to store, process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and filtering, which have been used for decades to compress signals are often undefined in data that is characterized by heterogeneity, high dimensionality, and lack of known structure. In this article, we describe and demonstrate an approach to sample large datasets such as social media data. We evaluate the effect of sampling on a common predictive analytic: link prediction. Our results indicate that greatly sampling a dataset can still yield meaningful link prediction results.

查看原文本刊更多论文

大数据采样操作

大数据的3v(体积、速度和种类)对系统和算法来说仍然是一个巨大的挑战，这些系统和算法旨在存储、处理和传播信息，以便在实时约束下进行发现和探索。常见的信号处理操作，如采样和滤波，几十年来一直用于压缩信号，但在具有异质性、高维性和缺乏已知结构的数据中往往没有定义。在本文中，我们描述并演示了一种对大型数据集(如社交媒体数据)进行采样的方法。我们评估了抽样对一种常见的预测分析:链接预测的影响。我们的研究结果表明，大量采样数据集仍然可以产生有意义的链接预测结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 49th Asilomar Conference on Signals, Systems and Computers

自引率

0.00%

发文量