Stratified-sampling over social networks using mapreduce

R. Levin, Y. Kanza
{"title":"Stratified-sampling over social networks using mapreduce","authors":"R. Levin, Y. Kanza","doi":"10.1145/2588555.2588577","DOIUrl":null,"url":null,"abstract":"Sampling is being used in statistical surveys to select a subset of individuals from some population, to estimate properties of the population. In stratified sampling, the surveyed population is partitioned into homogeneous subgroups and individuals are selected within the subgroups, to reduce the sample size. In this paper we consider sampling of large-scale, distributed online social networks, and we show how to deal with cases where several surveys are conducted in parallel---in some surveys it may be desired to share individuals to reduce costs, while in other surveys, sharing should be minimized, e.g., to prevent survey fatigue. A multi-survey stratified sampling is the task of choosing the individuals for several surveys, in parallel, according to sharing constraints, without a bias. In this paper, we present a scalable distributed algorithm, designed for the MapReduce framework, for answering stratified-sampling queries over a population of a social network. We also present an algorithm to effectively answer multi-survey stratified sampling, and we show how to implement it using MapReduce. An experimental evaluation illustrates the efficiency of our algorithms and their effectiveness for multi-survey stratified sampling.","PeriodicalId":314442,"journal":{"name":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2588555.2588577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Sampling is being used in statistical surveys to select a subset of individuals from some population, to estimate properties of the population. In stratified sampling, the surveyed population is partitioned into homogeneous subgroups and individuals are selected within the subgroups, to reduce the sample size. In this paper we consider sampling of large-scale, distributed online social networks, and we show how to deal with cases where several surveys are conducted in parallel---in some surveys it may be desired to share individuals to reduce costs, while in other surveys, sharing should be minimized, e.g., to prevent survey fatigue. A multi-survey stratified sampling is the task of choosing the individuals for several surveys, in parallel, according to sharing constraints, without a bias. In this paper, we present a scalable distributed algorithm, designed for the MapReduce framework, for answering stratified-sampling queries over a population of a social network. We also present an algorithm to effectively answer multi-survey stratified sampling, and we show how to implement it using MapReduce. An experimental evaluation illustrates the efficiency of our algorithms and their effectiveness for multi-survey stratified sampling.
使用mapreduce对社交网络进行分层抽样
抽样在统计调查中被用来从某些人口中选择一个子集的个人,以估计人口的性质。在分层抽样中,被调查的人口被分成同质的亚组,并在亚组中选择个人,以减少样本量。在本文中,我们考虑对大规模、分布式的在线社交网络进行抽样,并展示了如何处理并行进行多项调查的情况——在一些调查中,可能希望共享个人以降低成本,而在其他调查中,共享应最小化,例如,防止调查疲劳。多调查分层抽样的任务是根据共享约束,无偏倚地为并行的若干调查选择个人。在本文中,我们提出了一种为MapReduce框架设计的可扩展分布式算法,用于回答社交网络人口的分层抽样查询。我们还提出了一种有效回答多调查分层抽样的算法,并展示了如何使用MapReduce实现它。实验评价表明了我们的算法的效率和它们对多调查分层抽样的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信