利用多来源大数据的预测建模，提高样本效率，减少调查无响应误差

IF 1.6 4区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Survey Statistics and Methodology Pub Date : 2023-06-10 DOI:10.1093/jssam/smad016

David Dutwin, Patrick Coyle, I. Bilgen, N. English

{"title":"利用多来源大数据的预测建模，提高样本效率，减少调查无响应误差","authors":"David Dutwin, Patrick Coyle, I. Bilgen, N. English","doi":"10.1093/jssam/smad016","DOIUrl":null,"url":null,"abstract":"\n Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error\",\"authors\":\"David Dutwin, Patrick Coyle, I. Bilgen, N. English\",\"doi\":\"10.1093/jssam/smad016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.\",\"PeriodicalId\":17146,\"journal\":{\"name\":\"Journal of Survey Statistics and Methodology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2023-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Survey Statistics and Methodology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/jssam/smad016\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SOCIAL SCIENCES, MATHEMATICAL METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Survey Statistics and Methodology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/jssam/smad016","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}

引用次数: 0

摘要

大数据作为调查数据的补充，有时作为替代，在最好的情况下，作为“力量倍增器”来提高调查分析和洞察力。我们详细介绍了一个用例，即大数据分类器(BDC)，以替代更传统的针对特定家庭和个人属性进行调查抽样的目标家庭方法。与地理定位和商业供应商标志的使用非常相似，我们详细描述了bdc预测任何给定家庭(例如，家庭中有孩子或西班牙裔人)的可能性的能力。我们专门构建了15个bdc，结合了来自全国代表性的基于概率的大型面板的数据和来自公共和私人来源的一系列大数据，然后评估了这些bdc在三个大型调查数据集中成功预测其预测属性范围的有效性。对于每个BDC和每个数据应用程序，我们将BDC与地理聚类和供应商标志的历史样本目标技术的相对有效性进行了比较。总体而言，bdc在针对亚群体的能力方面略有提高。我们发现预测的类别始终更有效，而其他bdc与供应商标记相当，尽管总是优于地理聚类。我们提出了一些bdc的相对优势和劣势，作为识别和随后采样低发病率和其他人群的新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error

Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Survey Statistics and Methodology Multiple-

CiteScore

4.30

自引率

9.50%

发文量

期刊介绍： The Journal of Survey Statistics and Methodology, sponsored by AAPOR and the American Statistical Association, began publishing in 2013. Its objective is to publish cutting edge scholarly articles on statistical and methodological issues for sample surveys, censuses, administrative record systems, and other related data. It aims to be the flagship journal for research on survey statistics and methodology. Topics of interest include survey sample design, statistical inference, nonresponse, measurement error, the effects of modes of data collection, paradata and responsive survey design, combining data from multiple sources, record linkage, disclosure limitation, and other issues in survey statistics and methodology. The journal publishes both theoretical and applied papers, provided the theory is motivated by an important applied problem and the applied papers report on research that contributes generalizable knowledge to the field. Review papers are also welcomed. Papers on a broad range of surveys are encouraged, including (but not limited to) surveys concerning business, economics, marketing research, social science, environment, epidemiology, biostatistics and official statistics. The journal has three sections. The Survey Statistics section presents papers on innovative sampling procedures, imputation, weighting, measures of uncertainty, small area inference, new methods of analysis, and other statistical issues related to surveys. The Survey Methodology section presents papers that focus on methodological research, including methodological experiments, methods of data collection and use of paradata. The Applications section contains papers involving innovative applications of methods and providing practical contributions and guidance, and/or significant new findings.