一种生成合成群体的非参数方法，用于调整复杂的抽样设计特征。

IF 1.2 4区数学 Q3 SOCIAL SCIENCES, MATHEMATICAL METHODS

Survey Methodology Pub Date : 2014-06-01 Epub Date: 2014-06-27

Qi Dong, Michael R Elliott, Trivellore E Raghunathan

{"title":"一种生成合成群体的非参数方法，用于调整复杂的抽样设计特征。","authors":"Qi Dong, Michael R Elliott, Trivellore E Raghunathan","doi":"","DOIUrl":null,"url":null,"abstract":"Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":"40 1","pages":"29-46"},"PeriodicalIF":1.2000,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5708580/pdf/nihms921248.pdf","citationCount":"0","resultStr":"{\"title\":\"A nonparametric method to generate synthetic populations to adjust for complex sampling design features.\",\"authors\":\"Qi Dong, Michael R Elliott, Trivellore E Raghunathan\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.\",\"PeriodicalId\":51191,\"journal\":{\"name\":\"Survey Methodology\",\"volume\":\"40 1\",\"pages\":\"29-46\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2014-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5708580/pdf/nihms921248.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Survey Methodology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2014/6/27 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"SOCIAL SCIENCES, MATHEMATICAL METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Survey Methodology","FirstCategoryId":"100","ListUrlMain":"","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2014/6/27 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}

引用次数: 0

摘要

在调查抽样文献之外，通常假定样本是由简单的随机抽样过程产生的，该过程会产生独立且同分布（IID）的样本。许多统计方法主要就是在这种 IID 世界中发展起来的。将这些方法应用于复杂抽样调查的数据时，如果不考虑调查设计的特点，可能会导致错误的推论。因此，人们投入了大量的时间和精力来开发统计方法，以分析复杂的调查数据并考虑样本设计。在使用有限总体贝叶斯推断法生成合成总体时，这个问题尤为重要，因为在缺失数据或披露风险环境下，或者在合并来自多个调查的数据时，经常会出现这种情况。通过扩展有限种群贝叶斯引导文献中的前人工作，我们提出了一种从后验预测分布生成合成种群的方法，该方法反转了复杂抽样设计的特征，并从超种群的角度生成简单随机样本，对复杂数据进行调整，使其可以作为简单随机样本进行分析。我们考虑了分层聚类不等概率抽样设计的模拟研究，并使用所提出的非参数方法生成了 2006 年全国健康访谈调查（NHIS）和医疗支出面板调查（MEPS）的合成人群，这两个调查都是分层聚类不等概率抽样设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

本刊更多论文

A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Survey Methodology 数学-统计学与概率论

CiteScore

0.80

自引率

22.20%

发文量

审稿时长

>12 weeks

期刊介绍： The journal publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves.