Survey Methodology最新文献

Fully Synthetic Data for Complex Surveys. 复杂调查的完全合成数据。

IF 1.2 4区数学

Survey Methodology Pub Date : 2024-01-01 Epub Date: 2024-12-20

Shirley Mathur, Yajuan Si, Jerome P Reiter

{"title":"Fully Synthetic Data for Complex Surveys.","authors":"Shirley Mathur, Yajuan Si, Jerome P Reiter","doi":"","DOIUrl":"","url":null,"abstract":"When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our approach adheres to the general strategy proposed by Rubin (1993). Specifically, we generate pseudo-populations by applying the weighted finite population Bayesian bootstrap to account for survey weights, take simple random samples from those pseudo-populations, estimate synthesis models using these simple random samples, and release simulated data drawn from the models as public use files. To facilitate variance estimation, we use the framework of multiple imputation with two data generation strategies. In the first, we generate multiple data sets from each simple random sample. In the second, we generate a single synthetic data set from each simple random sample. We present multiple imputation combining rules for each setting. We illustrate the repeated sampling properties of the combining rules via simulation studies, including comparisons with synthetic data generation based on pseudo-likelihood methods. We apply the proposed methods to a subset of data from the American Community Survey.","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":"50 2","pages":"347-373"},"PeriodicalIF":1.2,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11759325/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment. 锚定法:在没有互渗透样本分配的情况下，估计采访者的效果。

IF 0.9 4区数学

Survey Methodology Pub Date : 2022-06-01

Michael R Elliott, Brady T West, Xinyu Zhang, Stephanie Coffey

{"title":"The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment.","authors":"Michael R Elliott, Brady T West, Xinyu Zhang, Stephanie Coffey","doi":"","DOIUrl":"","url":null,"abstract":"Methodological studies of the effects that human interviewers have on the quality of survey data have long been limited by a critical assumption: that interviewers in a given survey are assigned random subsets of the larger overall sample (also known as interpenetrated assignment). Absent this type of study design, estimates of interviewer effects on survey measures of interest may reflect differences between interviewers in the characteristics of their assigned sample members, rather than recruitment or measurement effects specifically introduced by the interviewers. Previous attempts to approximate interpenetrated assignment have typically used regression models to condition on factors that might be related to interviewer assignment. We introduce a new approach for overcoming this lack of interpenetrated assignment when estimating interviewer effects. This approach, which we refer to as the \"anchoring\" method, leverages correlations between observed variables that are unlikely to be affected by interviewers (\"anchors\") and variables that may be prone to interviewer effects to remove components of within-interviewer correlations that lack of interpenetrated assignment may introduce. We consider both frequentist and Bayesian approaches, where the latter can make use of information about interviewer effect variances in previous waves of a study, if available. We evaluate this new methodology empirically using a simulation study, and then illustrate its application using real survey data from the Behavioral Risk Factor Surveillance System (BRFSS), where interviewer IDs are provided on public-use data files. While our proposed method shares some of the limitations of the traditional approach - namely the need for variables associated with the outcome of interest that are also free of measurement error - it avoids the need for conditional inference and thus has improved inferential qualities when the focus is on marginal estimates, and it shows evidence of further reducing overestimation of larger interviewer effects relative to the traditional approach.","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":"48 1","pages":"25-48"},"PeriodicalIF":0.9,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9983757/pdf/nihms-1832600.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10844524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A note on multiply robust predictive mean matching imputation with complex survey data. 关于复杂调查数据的多稳健预测均值匹配估算的说明。

IF 0.9 4区数学

Survey Methodology Pub Date : 2021-06-01 Epub Date: 2021-06-24

Sixia Chen, David Haziza, Alexander Stubblefield

引用次数: 0

Optimum allocation for a dual-frame telephone survey. 双帧电话调查的最佳分配。

IF 1.2 4区数学

Survey Methodology Pub Date : 2015-12-01 Epub Date: 2015-12-17

Kirk M Wolter, Xian Tao, Robert Montgomery, Philip J Smith

引用次数: 0

Combining information from multiple complex surveys. 结合多个复杂调查的信息。

IF 0.9 4区数学

Survey Methodology Pub Date : 2014-12-01 Epub Date: 2014-12-19

Qi Dong, Michael R Elliott, Trivellore E Raghunathan

引用次数: 0

A nonparametric method to generate synthetic populations to adjust for complex sampling design features. 一种生成合成群体的非参数方法，用于调整复杂的抽样设计特征。

IF 0.9 4区数学

Survey Methodology Pub Date : 2014-06-01 Epub Date: 2014-06-27

Qi Dong, Michael R Elliott, Trivellore E Raghunathan

{"title":"A nonparametric method to generate synthetic populations to adjust for complex sampling design features.","authors":"Qi Dong, Michael R Elliott, Trivellore E Raghunathan","doi":"","DOIUrl":"","url":null,"abstract":"Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":"40 1","pages":"29-46"},"PeriodicalIF":0.9,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5708580/pdf/nihms921248.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35215509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian inference for finite population quantiles from unequal probability samples. 不等概率样本有限总体分位数的贝叶斯推理。

IF 0.9 4区数学

Survey Methodology Pub Date : 2012-12-01 Epub Date: 2012-12-19

Qixuan Chen, Michael R Elliott, Roderick J A Little

{"title":"Bayesian inference for finite population quantiles from unequal probability samples.","authors":"Qixuan Chen, Michael R Elliott, Roderick J A Little","doi":"","DOIUrl":"","url":null,"abstract":"This paper develops two Bayesian methods for inference about finite population quantiles of continuous survey variables from unequal probability sampling. The first method estimates cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function. This method is quite computationally demanding. The second method predicts non-sampled values by assuming a smoothly-varying relationship between the continuous survey variable and the probability of inclusion, by modeling both the mean function and the variance function using splines. The two Bayesian spline-model-based estimators yield a desirable balance between robustness and efficiency. Simulation studies show that both methods yield smaller root mean squared errors than the sample-weighted estimator and the ratio and difference estimators described by Rao, Kovar, and Mantel (RKM 1990), and are more robust to model misspecification than the regression through the origin model-based estimator described in Chambers and Dunstan (1986). When the sample size is small, the 95% credible intervals of the two new methods have closer to nominal confidence coverage than the sample-weighted estimator.","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":"38 2","pages":"203-214"},"PeriodicalIF":0.9,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5708554/pdf/nihms921237.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35215508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling. 基于贝叶斯惩罚样条模型的不等概率抽样有限总体比例推理。

IF 0.9 4区数学

Survey Methodology Pub Date : 2010-06-01 Epub Date: 2010-06-29

Qixuan Chen, Michael R Elliott, Roderick J A Little

{"title":"Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling.","authors":"Qixuan Chen, Michael R Elliott, Roderick J A Little","doi":"","DOIUrl":"","url":null,"abstract":"We propose a Bayesian Penalized Spline Predictive (BPSP) estimator for a finite population proportion in an unequal probability sampling setting. This new method allows the probabilities of inclusion to be directly incorporated into the estimation of a population proportion, using a probit regression of the binary outcome on the penalized spline of the inclusion probabilities. The posterior predictive distribution of the population proportion is obtained using Gibbs sampling. The advantages of the BPSP estimator over the Hájek (HK), Generalized Regression (GR), and parametric model-based prediction estimators are demonstrated by simulation studies and a real example in tax auditing. Simulation studies show that the BPSP estimator is more efficient, and its 95% credible interval provides better confidence coverage with shorter average width than the HK and GR estimators, especially when the population proportion is close to zero or one or when the sample is small. Compared to linear model-based predictive estimators, the BPSP estimators are robust to model misspecification and influential observations in the sample.","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":"36 1","pages":"23-34"},"PeriodicalIF":0.9,"publicationDate":"2010-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5708555/pdf/nihms921230.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35215506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal sample allocation for design-consistent regression in a cancer services survey when design variables are known for aggregates. 当设计变量已知为总量时，癌症服务调查中设计一致回归的最佳样本分配。

IF 1.2 4区数学

Survey Methodology Pub Date : 2008-06-01

Alan M Zaslavsky, Hui Zheng, John Adams

{"title":"Optimal sample allocation for design-consistent regression in a cancer services survey when design variables are known for aggregates.","authors":"Alan M Zaslavsky, Hui Zheng, John Adams","doi":"","DOIUrl":"","url":null,"abstract":"We consider optimal sampling rates in element-sampling designs when the anticipated analysis is survey-weighted linear regression and the estimands of interest are linear combinations of regression coefficients from one or more models. Methods are first developed assuming that exact design information is available in the sampling frame and then generalized to situations in which some design variables are available only as aggregates for groups of potential subjects, or from inaccurate or old data. We also consider design for estimation of combinations of coefficients from more than one model. A further generalization allows for flexible combinations of coefficients chosen to improve estimation of one effect while controlling for another. Potential applications include estimation of means for several sets of overlapping domains, or improving estimates for subpopulations such as minority races by disproportionate sampling of geographic areas. In the motivating problem of designing a survey on care received by cancer patients (the CanCORS study), potential design information included block-level census data on race/ethnicity and poverty as well as individual-level data. In one study site, an unequal-probability sampling design using the subjectss residential addresses and census data would have reduced the variance of the estimator of an income effect by 25%, or by 38% if the subjects' races were also known. With flexible weighting of the income contrasts by race, the variance of the estimator would be reduced by 26% using residential addresses alone and by 52% using addresses and races. Our methods would be useful in studies in which geographic oversampling by race-ethnicity or socioeconomic characteristics is considered, or in any study in which characteristics available in sampling frames are measured with error.","PeriodicalId":51191,"journal":{"name":"Survey Methodology","volume":"34 1","pages":"65-78"},"PeriodicalIF":1.2,"publicationDate":"2008-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2725367/pdf/nihms-105215.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28339824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimation of the Distribution of Hourly Pay from Household Survey Data: The Use of Missing Data Methods to Handle Measurement Error 从住户调查数据估计时薪分布:利用缺失数据方法处理测量误差

IF 0.9 4区数学

Survey Methodology Pub Date : 2003-05-22 DOI: 10.1920/WP.CEM.2003.1203

G. Beissel-Durrant, C. Skinner

引用次数: 2