Tingyu Zhu, Laura J Gamble, Matthew Klapman, Lan Xue, Virginia M Lesser
{"title":"Using Auxiliary Information in Probability Survey Data to Improve Pseudo-Weighting in Nonprobability Samples: A Copula Model Approach","authors":"Tingyu Zhu, Laura J Gamble, Matthew Klapman, Lan Xue, Virginia M Lesser","doi":"10.1093/jssam/smad032","DOIUrl":"https://doi.org/10.1093/jssam/smad032","url":null,"abstract":"Abstract While probability sampling has been considered the gold standard of survey methods, nonprobability sampling is increasingly popular due to its convenience and low cost. However, nonprobability samples can lead to biased estimates due to the unknown nature of the underlying selection mechanism. In this article, we propose parametric and semiparametric approaches to integrate probability and nonprobability samples using common ancillary variables observed in both samples. In the parametric approach, the joint distribution of ancillary variables is assumed to follow the latent Gaussian copula model, which is flexible to accommodate both categorical and continuous variables. In contrast, the semiparametric approach requires no assumptions about the distribution of ancillary variables. In addition, logistic regression is used to model the mechanism by which population units enter the nonprobability sample. The unknown parameters in the copula model are estimated through the pseudo maximum likelihood approach. The logistic regression model is estimated by maximizing the sample likelihood constructed from the nonprobability sample. The proposed method is evaluated in the context of estimating the population mean. Our simulation results show that the proposed method is able to correct the selection bias in the nonprobability sample by consistently estimating the underlying inclusion mechanism. By incorporating additional information in the nonprobability sample, the combined method can estimate the population mean more efficiently than using the probability sample alone. A real-data application is provided to illustrate the practical use of the proposed method.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135825710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiyu Zhang, Brady T West, James Wagner, Rebecca Gatward
{"title":"Incorporating Adaptive Survey Design in a Two-Stage National Web or Mail Mixed-Mode Survey: An Experiment in the American Family Health Study","authors":"Shiyu Zhang, Brady T West, James Wagner, Rebecca Gatward","doi":"10.1093/jssam/smad035","DOIUrl":"https://doi.org/10.1093/jssam/smad035","url":null,"abstract":"Abstract This article presents the results of an adaptive design experiment in the recruitment of households and individuals for a two-stage national probability web or mail mixed-mode survey, the American Family Health Study (AFHS). In the screening stage, we based the adaptive design’s subgroup differentiation on Esri Tapestry segmentation. We used tailored invitation materials for a subsample where a high proportion of the population was Hispanic and added a paper questionnaire to the initial mailing for a subsample with rural and older families. In the main-survey stage, the adaptive design targeted the households where a member other than the screening respondent was selected for the survey. The adaptations included emailing and/or texting, an additional prepaid incentive, and seeking screening respondents’ help to remind the selected individuals. The main research questions are (i) whether the adaptive design improved survey production outcomes and (ii) whether combining adaptive design and postsurvey weighting adjustments improved survey estimates compared to performing postsurvey adjustments alone. Unfortunately, the adaptive designs did not improve the survey production outcomes. We found that the weighted AFHS estimates closely resemble those of a benchmark national face-to-face survey, the National Survey of Family Growth, although the adaptive design did not additionally change survey estimates beyond the weighting adjustments. Nonetheless, our experiment yields useful insights about the implementation of adaptive design in a self-administered mail-recruit web or mail survey. We were able to identify subgroups with potentially lower response rates and distinctive characteristics, but it was challenging to develop effective protocol adaptations for these subgroups under the constraints of the two primary survey modes and the operational budget of the AFHS. In addition, for self-administered within-household selection, it was difficult to obtain contact information from, reach, and recruit selected household members that did not respond to the screening interview.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135825944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Imputation of General Data","authors":"Michael W Robbins","doi":"10.1093/jssam/smad034","DOIUrl":"https://doi.org/10.1093/jssam/smad034","url":null,"abstract":"Abstract High-dimensional complex survey data of general structures (e.g., containing continuous, binary, categorical, and ordinal variables), such as the US Department of Defense’s Health-Related Behaviors Survey (HRBS), often confound procedures designed to impute any missing survey data. Imputation by fully conditional specification (FCS) is often considered the state of the art for such datasets due to its generality and flexibility. However, FCS procedures contain a theoretical flaw that is exposed by HRBS data—HRBS imputations created with FCS are shown to diverge across iterations of Markov Chain Monte Carlo. Imputation by joint modeling lacks this flaw; however, current joint modeling procedures are neither general nor flexible enough to handle HRBS data. As such, we introduce an algorithm that efficiently and flexibly applies multiple imputation by joint modeling in data of general structures. This procedure draws imputations from a latent joint multivariate normal model that underpins the generally structured data and models the latent data via a sequence of conditional linear models, the predictors of which can be specified by the user. We perform rigorous evaluations of HRBS imputations created with the new algorithm and show that they are convergent and of high quality. Lastly, simulations verify that the proposed method performs well compared to existing algorithms including FCS.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135878956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James Wagner, Lena Centeno, Richard Dulaney, Brad Edwards, Z Tuba Suzer-Gurtekin, Stephanie Coffey
{"title":"Proxy Survey Cost Indicators in Interviewer-Administered Surveys: Are they Actually Correlated with Costs?","authors":"James Wagner, Lena Centeno, Richard Dulaney, Brad Edwards, Z Tuba Suzer-Gurtekin, Stephanie Coffey","doi":"10.1093/jssam/smad028","DOIUrl":"https://doi.org/10.1093/jssam/smad028","url":null,"abstract":"Abstract Survey design decisions are—by their very nature—tradeoffs between costs and errors. However, measuring costs is often difficult. Furthermore, surveys are growing more complex. Many surveys require that cost information be available to make decisions during data collection. These complexities create new challenges for monitoring and understanding survey costs. Often, survey cost information lags behind reporting of paradata. Furthermore, in some situations, the measurement of costs at the case level is difficult. Given the time lag in reporting cost information and the difficulty of assigning costs directly to cases, survey designers and managers have frequently turned to proxy indicators for cost. These proxy measures are often based upon level-of-effort paradata. An example of such a proxy cost indicator is the number of attempts per interview. Unfortunately, little is known about how accurately these proxy indicators actually mirror the true costs of the survey. In this article, we examine a set of these proxy indicators across several surveys with different designs, including different modes of interview. We examine the strength of correlation between these indicators and two different measures of costs—the total project cost and total interviewer hours. This article provides some initial evidence about the quality of these proxies as surrogates for the true costs using data from several different surveys with interviewer-administered modes (telephone, face to face) across three organizations (University of Michigan’s Survey Research Center, Westat, US Census Bureau). We find that some indicators (total attempts, total contacts, total completes, sample size) are correlated (average correlation ∼0.60) with total costs across several surveys. These same indicators are strongly correlated (average correlation ∼0.82) with total interviewer hours. For survey components, three indicators (total attempts, sample size, and total miles) are strongly correlated with both total costs (average correlation ∼0.77) and with total interviewer hours (average correlation ∼0.86).","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136081761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Total Bias in Income Surveys when Nonresponse and Measurement Errors are Correlated","authors":"Andrea Neri, Eleonora Porreca","doi":"10.1093/jssam/smad027","DOIUrl":"https://doi.org/10.1093/jssam/smad027","url":null,"abstract":"Abstract Household surveys on income might suffer from quality limitations mainly due to the difficulty of enrolling households (unit nonresponse) and retrieving correct information during the interview (measurement error [ME]). These errors are likely to be correlated because of latent factors, such as the threat of disclosing personal information, the perceived sensitivity of the topic, or social desirability. For survey organizations, assessing the interplay of these errors and their impact on the accuracy and precision of inferences derived from their data is crucial. In this article, we propose to use a standard sample selection model within a total survey error framework to deal with the case of correlated nonresponse error (NR) and ME in estimating average household income. We use it to study the correlation between the two errors, quantify the ME component due to this correlation, and evaluate ME among nonrespondents. Using the Italian Survey on Income and Wealth linked with administrative income data from tax returns, we find a positive correlation between the two errors and that households at the extremes of the income distribution mainly cause this association. Our results show that ME contributes more to the total error than unit nonresponse and that it would be larger in absence of the correlation between the two errors. Finally, efforts to reduce nonresponse rates are worthwhile only for nonrespondents in the lowest estimated response propensity group. If these households participate, the bias decreases because of the reduction in NR that offsets the increase in ME.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136242795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Auxiliary Marginal Distributions in Imputations for Nonresponse while Accounting for Survey Weights, with Application to Estimating Voter Turnout","authors":"Jiurui Tang, D Sunshine Hillygus, Jerome P Reiter","doi":"10.1093/jssam/smad033","DOIUrl":"https://doi.org/10.1093/jssam/smad033","url":null,"abstract":"Abstract In many survey settings, population counts or percentages are available for some of the variables in the survey, for example, from censuses, administrative databases, or other high-quality surveys. We present a model-based approach to utilize such auxiliary marginal distributions in multiple imputation for unit and item nonresponse in complex surveys. In doing so, we ensure that the imputations produce design-based estimates that are plausible given the known margins. We introduce and utilize a hybrid missingness model comprising a pattern mixture model for unit nonresponse and selection models for item nonresponse. We also develop a computational strategy for estimating the parameters of and generating imputations with hybrid missingness models. We apply a hybrid missingness model to examine voter turnout by subgroups using the 2018 Current Population Survey for North Carolina. The hybrid missingness model also facilitates modeling measurement errors simultaneously with handling missing values. We illustrate this feature with the voter turnout application by examining how results change when we allow for overreporting, that is, individuals self-reporting that they voted when in fact they did not.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136336393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable Inclusion Strategies for Effective Quota Sampling and Propensity Modeling: An Application to SARS-COV-2 Infection Prevalence Estimation","authors":"Yan Li, M. Fay, Sally A. Hunsberger, B. Graubard","doi":"10.1093/jssam/smad026","DOIUrl":"https://doi.org/10.1093/jssam/smad026","url":null,"abstract":"\u0000 Public health policymakers must make crucial decisions rapidly during a pandemic. In such situations, accurate measurements from health surveys are essential. As a consequence of limited time and resource constraints, it may be infeasible to implement a probability-based sample that yields high response rates. An alternative approach is to select a quota sample from a large pool of volunteers, with the quota sample selection based on the census distributions of available—often demographic—variables, also known as quota variables. In practice, however, census data may only contain a subset of the required predictor variables. Thus, the realized quota sample can be adjusted by propensity score pseudoweighting using a “reference” probability-based survey that contains more predictor variables. Motivated by the SARS-CoV-2 serosurvey (a quota sample conducted in 2020 by the National Institutes of Health), we identify the condition under which the quota variables can be ignored in constructing the propensity model but still produce nearly unbiased estimation of population means. We conduct limited simulations to evaluate the bias and variance reduction properties of alternative weighting strategies for quota sample estimates under three propensity models that account for varying sets of predictors and degrees of correlation among the predictor sets and then apply our findings to the empirical data.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46070161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discussion of the 2022 Hansen Lecture: “The Evolution of the Use of Models in Survey Sampling”","authors":"F. Breidt","doi":"10.1093/jssam/smad030","DOIUrl":"https://doi.org/10.1093/jssam/smad030","url":null,"abstract":"\u0000 The 2022 Hansen Lecture gave a broad overview of the use of models in survey sampling, with emphasis on modeling approaches to incorporating auxiliary information in survey estimators. This discussion expands upon some issues in model-assisted estimation, exploring data needs and the availability of multipurpose weights for advanced modeling methods.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48827563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura J Gamble, L. Johnston, P. Pham, P. Vinck, Katherine R. McLaughlin
{"title":"Estimating the Size of Clustered Hidden Populations","authors":"Laura J Gamble, L. Johnston, P. Pham, P. Vinck, Katherine R. McLaughlin","doi":"10.1093/jssam/smad025","DOIUrl":"https://doi.org/10.1093/jssam/smad025","url":null,"abstract":"\u0000 Successive sampling population size estimation (SS-PSE) is a method used by government agencies, aid organizations, and researchers around the world to estimate the size of hidden populations using data from respondent-driven sampling surveys. SS-PSE addresses a specific need in estimation, since many countries rely on having accurate size estimates to plan and allocate finite resources to address the needs of hidden populations. However, SS-PSE relies on several assumptions, one of which requires the underlying social network of the hidden population to be fully connected. We propose two modifications to SS-PSE for estimating the size of hidden populations whose underlying social network is composed of disjoint clusters. The first method is a theoretically straightforward extension of SS-PSE, but it relies on prior information that may be difficult to obtain in practice. The second method extends the Bayesian SS-PSE model by introducing a new set of parameters that allow for clustered estimation without requiring the additional prior information. After providing theoretical justification for both novel methods, we then assess their performance using simulations and apply the Clustered SS-PSE method to a population of internally displaced persons in Bamako, Mali.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48480602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multivariate Small-area Estimation for Mixed-type Response Variables With Item Nonresponse","authors":"Haoliang Sun, Emily J. Berg, Zhengyuan Zhu","doi":"10.1093/jssam/smad018","DOIUrl":"https://doi.org/10.1093/jssam/smad018","url":null,"abstract":"\u0000 Many surveys collect information on discrete characteristics and continuous variables, that is, mixed-type variables. Small-area statistics of interest include means or proportions of the response variables as well as their domain means, which are the mean values at each level of a different categorical variable. However, item nonresponse in survey data increases the complexity of small-area estimation. To address this issue, we propose a multivariate mixed-effects model for mixed-type response variables subject to item nonresponse. We apply this method to two data structures where the data are missing completely at random by design. We use empirical data from two separate studies: a survey of pet owners and a dataset from the National Resources Inventory. In these applications, our proposed method leads to improvements relative to a direct estimator and a predictor based on a univariate model.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49177498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}