使用贝叶斯机器学习从非随机样本推断。

IF 1.6 4区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS
Journal of Survey Statistics and Methodology Pub Date : 2022-01-20 eCollection Date: 2023-04-01 DOI:10.1093/jssam/smab049
Yutao Liu, Andrew Gelman, Qixuan Chen
{"title":"使用贝叶斯机器学习从非随机样本推断。","authors":"Yutao Liu,&nbsp;Andrew Gelman,&nbsp;Qixuan Chen","doi":"10.1093/jssam/smab049","DOIUrl":null,"url":null,"abstract":"<p><p>We consider inference from nonrandom samples in data-rich settings where high-dimensional auxiliary information is available both in the sample and the target population, with survey inference being a special case. We propose a regularized prediction approach that predicts the outcomes in the population using a large number of auxiliary variables such that the ignorability assumption is reasonable and the Bayesian framework is straightforward for quantification of uncertainty. Besides the auxiliary variables, we also extend the approach by estimating the propensity score for a unit to be included in the sample and also including it as a predictor in the machine learning models. We find in simulation studies that the regularized predictions using soft Bayesian additive regression trees yield valid inference for the population means and coverage rates close to the nominal levels. We demonstrate the application of the proposed methods using two different real data applications, one in a survey and one in an epidemiologic study.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2022-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10080218/pdf/smab049.pdf","citationCount":"7","resultStr":"{\"title\":\"Inference from Nonrandom Samples Using Bayesian Machine Learning.\",\"authors\":\"Yutao Liu,&nbsp;Andrew Gelman,&nbsp;Qixuan Chen\",\"doi\":\"10.1093/jssam/smab049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We consider inference from nonrandom samples in data-rich settings where high-dimensional auxiliary information is available both in the sample and the target population, with survey inference being a special case. We propose a regularized prediction approach that predicts the outcomes in the population using a large number of auxiliary variables such that the ignorability assumption is reasonable and the Bayesian framework is straightforward for quantification of uncertainty. Besides the auxiliary variables, we also extend the approach by estimating the propensity score for a unit to be included in the sample and also including it as a predictor in the machine learning models. We find in simulation studies that the regularized predictions using soft Bayesian additive regression trees yield valid inference for the population means and coverage rates close to the nominal levels. We demonstrate the application of the proposed methods using two different real data applications, one in a survey and one in an epidemiologic study.</p>\",\"PeriodicalId\":17146,\"journal\":{\"name\":\"Journal of Survey Statistics and Methodology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2022-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10080218/pdf/smab049.pdf\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Survey Statistics and Methodology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/jssam/smab049\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/4/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"SOCIAL SCIENCES, MATHEMATICAL METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Survey Statistics and Methodology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/jssam/smab049","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}
引用次数: 7

摘要

我们考虑在数据丰富的环境中从非随机样本进行推断,其中样本和目标人群中都有高维辅助信息,调查推断是一种特殊情况。我们提出了一种正则化预测方法,该方法使用大量辅助变量来预测人群中的结果,使得可忽略性假设是合理的,并且贝叶斯框架对于不确定性的量化是直接的。除了辅助变量之外,我们还通过估计样本中包含的单元的倾向得分来扩展该方法,并将其作为机器学习模型中的预测器。我们在模拟研究中发现,使用软贝叶斯加性回归树的正则化预测对接近标称水平的总体均值和覆盖率产生了有效的推断。我们使用两种不同的真实数据应用,一种在调查中,另一种在流行病学研究中,展示了所提出方法的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Inference from Nonrandom Samples Using Bayesian Machine Learning.

We consider inference from nonrandom samples in data-rich settings where high-dimensional auxiliary information is available both in the sample and the target population, with survey inference being a special case. We propose a regularized prediction approach that predicts the outcomes in the population using a large number of auxiliary variables such that the ignorability assumption is reasonable and the Bayesian framework is straightforward for quantification of uncertainty. Besides the auxiliary variables, we also extend the approach by estimating the propensity score for a unit to be included in the sample and also including it as a predictor in the machine learning models. We find in simulation studies that the regularized predictions using soft Bayesian additive regression trees yield valid inference for the population means and coverage rates close to the nominal levels. We demonstrate the application of the proposed methods using two different real data applications, one in a survey and one in an epidemiologic study.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.30
自引率
9.50%
发文量
40
期刊介绍: The Journal of Survey Statistics and Methodology, sponsored by AAPOR and the American Statistical Association, began publishing in 2013. Its objective is to publish cutting edge scholarly articles on statistical and methodological issues for sample surveys, censuses, administrative record systems, and other related data. It aims to be the flagship journal for research on survey statistics and methodology. Topics of interest include survey sample design, statistical inference, nonresponse, measurement error, the effects of modes of data collection, paradata and responsive survey design, combining data from multiple sources, record linkage, disclosure limitation, and other issues in survey statistics and methodology. The journal publishes both theoretical and applied papers, provided the theory is motivated by an important applied problem and the applied papers report on research that contributes generalizable knowledge to the field. Review papers are also welcomed. Papers on a broad range of surveys are encouraged, including (but not limited to) surveys concerning business, economics, marketing research, social science, environment, epidemiology, biostatistics and official statistics. The journal has three sections. The Survey Statistics section presents papers on innovative sampling procedures, imputation, weighting, measures of uncertainty, small area inference, new methods of analysis, and other statistical issues related to surveys. The Survey Methodology section presents papers that focus on methodological research, including methodological experiments, methods of data collection and use of paradata. The Applications section contains papers involving innovative applications of methods and providing practical contributions and guidance, and/or significant new findings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信