Lotus: Evolutionary Blind Regression over Noisy Crowdsourced Data

2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON) Pub Date : 2018-06-11 DOI:10.1109/SAHCN.2018.8397096

C. Li, Shan Chang, Hongzi Zhu, Hang Chen, Ting Lu

{"title":"Lotus: Evolutionary Blind Regression over Noisy Crowdsourced Data","authors":"C. Li, Shan Chang, Hongzi Zhu, Hang Chen, Ting Lu","doi":"10.1109/SAHCN.2018.8397096","DOIUrl":null,"url":null,"abstract":"In mobile crowd sensing (MCS) applications, a public model of a system or phenomenon is expected to be derived from sensory data, i.e., observations, collected by mobile device users, through regression modeling. Unique features of MCS data bring the regression task new challenges. First, observations are error-prone and private, making it of great difficulty to derive an accurate model without acquiring raw data. Second, observations are non-stationary and opportunistically generated, calling for an evolutionary model updating mechanism. Last, mobile devices are resource-constrained, posing an urgent demand for lightweight regression schemes. In this paper, we propose an evolutionary blind regression scheme, called Lotus, in MCS settings. The core idea is first to select a 'maximum- safe-subset' of observations locally stored over all participants, which refers to finding a subset containing half of observations, such that the corresponding regression model has a minimum value of residual sum of squares. It implies the inconsistency between observations in the subset is minimized. Since such a maximum-safe- subset selection problem is NP-hard, a distributed greedy hill- climbing algorithm is proposed. Then, based on the resulted regression model, more observations are checked. Selected ones will be used to refine the model. With observations constantly coming, newly selected 'safe' observations are used to make the model evolved. To preserve data privacy, a one-time pad masking mechanism, and a blocking scheme are integrated into the process of regression estimation. Intensive theoretical analysis and extensive trace driven simulations are conducted and the results demonstrate the efficacy of the Lotus design.","PeriodicalId":139623,"journal":{"name":"2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)","volume":"282 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAHCN.2018.8397096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In mobile crowd sensing (MCS) applications, a public model of a system or phenomenon is expected to be derived from sensory data, i.e., observations, collected by mobile device users, through regression modeling. Unique features of MCS data bring the regression task new challenges. First, observations are error-prone and private, making it of great difficulty to derive an accurate model without acquiring raw data. Second, observations are non-stationary and opportunistically generated, calling for an evolutionary model updating mechanism. Last, mobile devices are resource-constrained, posing an urgent demand for lightweight regression schemes. In this paper, we propose an evolutionary blind regression scheme, called Lotus, in MCS settings. The core idea is first to select a 'maximum- safe-subset' of observations locally stored over all participants, which refers to finding a subset containing half of observations, such that the corresponding regression model has a minimum value of residual sum of squares. It implies the inconsistency between observations in the subset is minimized. Since such a maximum-safe- subset selection problem is NP-hard, a distributed greedy hill- climbing algorithm is proposed. Then, based on the resulted regression model, more observations are checked. Selected ones will be used to refine the model. With observations constantly coming, newly selected 'safe' observations are used to make the model evolved. To preserve data privacy, a one-time pad masking mechanism, and a blocking scheme are integrated into the process of regression estimation. Intensive theoretical analysis and extensive trace driven simulations are conducted and the results demonstrate the efficacy of the Lotus design.

查看原文本刊更多论文

Lotus:嘈杂众包数据的进化盲回归

在移动人群传感(MCS)应用中，系统或现象的公共模型预计将来自感官数据，即移动设备用户通过回归建模收集的观察结果。MCS数据的独特特性给回归任务带来了新的挑战。首先，观测结果容易出错，而且是私人的，这使得在没有原始数据的情况下很难得出准确的模型。其次，观测是非平稳的，是机会性产生的，需要一种进化模型更新机制。最后，移动设备资源受限，对轻量级回归方案提出了迫切需求。在本文中，我们提出了一个进化盲回归方案，称为Lotus，在MCS设置。核心思想是首先选择本地存储在所有参与者上的观测值的“最大安全子集”，这是指找到包含一半观测值的子集，使得相应的回归模型具有最小的残差平方和值。它意味着子集中观测值之间的不一致性被最小化。由于这种最大安全子集选择问题是np困难的，提出了一种分布式贪婪爬坡算法。然后，根据得到的回归模型，对更多的观测值进行检验。选定的选项将用于改进模型。随着观测的不断到来，新选择的“安全”观测被用来使模型进化。为了保护数据的隐私性，在回归估计过程中引入了一次性垫屏蔽机制和阻塞方案。进行了深入的理论分析和广泛的轨迹驱动仿真，结果证明了Lotus设计的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)

自引率

0.00%

发文量