V. Gadepally, Taylor Herr, Luke B. Johnson, Lauren Milechin, Maja Milosavljevic, B. A. Miller
{"title":"Sampling operations on big data","authors":"V. Gadepally, Taylor Herr, Luke B. Johnson, Lauren Milechin, Maja Milosavljevic, B. A. Miller","doi":"10.1109/ACSSC.2015.7421398","DOIUrl":null,"url":null,"abstract":"The 3Vs - Volume, Velocity and Variety - of Big Data continues to be a large challenge for systems and algorithms designed to store, process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and filtering, which have been used for decades to compress signals are often undefined in data that is characterized by heterogeneity, high dimensionality, and lack of known structure. In this article, we describe and demonstrate an approach to sample large datasets such as social media data. We evaluate the effect of sampling on a common predictive analytic: link prediction. Our results indicate that greatly sampling a dataset can still yield meaningful link prediction results.","PeriodicalId":172015,"journal":{"name":"2015 49th Asilomar Conference on Signals, Systems and Computers","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 49th Asilomar Conference on Signals, Systems and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSSC.2015.7421398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
The 3Vs - Volume, Velocity and Variety - of Big Data continues to be a large challenge for systems and algorithms designed to store, process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and filtering, which have been used for decades to compress signals are often undefined in data that is characterized by heterogeneity, high dimensionality, and lack of known structure. In this article, we describe and demonstrate an approach to sample large datasets such as social media data. We evaluate the effect of sampling on a common predictive analytic: link prediction. Our results indicate that greatly sampling a dataset can still yield meaningful link prediction results.