Hafsteinn Einarsson, Alexandru Cernat, Natalie Shlomo
{"title":"Responsive and Adaptive Designs in Repeated Cross-National Surveys: A Simulation Study","authors":"Hafsteinn Einarsson, Alexandru Cernat, Natalie Shlomo","doi":"10.1093/jssam/smad038","DOIUrl":"https://doi.org/10.1093/jssam/smad038","url":null,"abstract":"Abstract Cross-national surveys run the risk of differential survey errors, where data collected vary in quality from country to country. Responsive and adaptive survey designs (RASDs) have been proposed as a way to reduce survey errors, by leveraging auxiliary variables to inform fieldwork efforts, but have rarely been considered in the context of cross-national surveys. Using data from the European Social Survey, we simulate fieldwork in a repeated cross-national survey using RASD where fieldwork efforts are ended early for selected units in the final stage of data collection. Demographic variables, paradata (interviewer observations), and contact data are used to inform fieldwork efforts. Eight combinations of response propensity models and selection mechanisms are evaluated in terms of sample composition (as measured by the coefficient of variation of response propensities), response rates, number of contact attempts saved, and effects on estimates of target variables in the survey. We find that sample balance can be improved in many country-round combinations. Response rates can be increased marginally and targeting high propensity respondents could lead to significant cost savings associated with making fewer contact attempts. Estimates of target variables are not changed by the case prioritizations used in the simulations, indicating that they do not impact nonresponse bias. We conclude that RASDs should be considered in cross-national surveys, but that more work is needed to identify suitable covariates to inform fieldwork efforts.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136317165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Mixture Model Approach to Assessing Measurement Error in Surveys Using Reinterviews","authors":"Simon Hoellerbauer","doi":"10.1093/jssam/smad037","DOIUrl":"https://doi.org/10.1093/jssam/smad037","url":null,"abstract":"Abstract Researchers are often unsure about the quality of the data collected by third-party actors, such as survey firms. This may be because of the inability to measure data quality effectively at scale and the difficulty with communicating which observations may be the source of measurement error. Researchers rely on survey firms to provide them with estimates of data quality and to identify observations that are problematic, potentially because they have been falsified or poorly collected. To address these issues, I propose the QualMix model, a mixture modeling approach to deriving estimates of survey data quality in situations in which two sets of responses exist for all or certain subsets of respondents. I apply this model to the context of survey reinterviews, a common form of data quality assessment used to detect falsification and data collection problems during enumeration. Through simulation based on real-world data, I demonstrate that the model successfully identifies incorrect observations and recovers latent enumerator and survey data quality. I further demonstrate the model’s utility by applying it to reinterview data from a large survey fielded in Malawi, using it to identify significant variation in data quality across observations generated by different enumerators.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135739888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingyu Zhu, Laura J Gamble, Matthew Klapman, Lan Xue, Virginia M Lesser
{"title":"Using Auxiliary Information in Probability Survey Data to Improve Pseudo-Weighting in Nonprobability Samples: A Copula Model Approach","authors":"Tingyu Zhu, Laura J Gamble, Matthew Klapman, Lan Xue, Virginia M Lesser","doi":"10.1093/jssam/smad032","DOIUrl":"https://doi.org/10.1093/jssam/smad032","url":null,"abstract":"Abstract While probability sampling has been considered the gold standard of survey methods, nonprobability sampling is increasingly popular due to its convenience and low cost. However, nonprobability samples can lead to biased estimates due to the unknown nature of the underlying selection mechanism. In this article, we propose parametric and semiparametric approaches to integrate probability and nonprobability samples using common ancillary variables observed in both samples. In the parametric approach, the joint distribution of ancillary variables is assumed to follow the latent Gaussian copula model, which is flexible to accommodate both categorical and continuous variables. In contrast, the semiparametric approach requires no assumptions about the distribution of ancillary variables. In addition, logistic regression is used to model the mechanism by which population units enter the nonprobability sample. The unknown parameters in the copula model are estimated through the pseudo maximum likelihood approach. The logistic regression model is estimated by maximizing the sample likelihood constructed from the nonprobability sample. The proposed method is evaluated in the context of estimating the population mean. Our simulation results show that the proposed method is able to correct the selection bias in the nonprobability sample by consistently estimating the underlying inclusion mechanism. By incorporating additional information in the nonprobability sample, the combined method can estimate the population mean more efficiently than using the probability sample alone. A real-data application is provided to illustrate the practical use of the proposed method.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135825710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiyu Zhang, Brady T West, James Wagner, Rebecca Gatward
{"title":"Incorporating Adaptive Survey Design in a Two-Stage National Web or Mail Mixed-Mode Survey: An Experiment in the American Family Health Study","authors":"Shiyu Zhang, Brady T West, James Wagner, Rebecca Gatward","doi":"10.1093/jssam/smad035","DOIUrl":"https://doi.org/10.1093/jssam/smad035","url":null,"abstract":"Abstract This article presents the results of an adaptive design experiment in the recruitment of households and individuals for a two-stage national probability web or mail mixed-mode survey, the American Family Health Study (AFHS). In the screening stage, we based the adaptive design’s subgroup differentiation on Esri Tapestry segmentation. We used tailored invitation materials for a subsample where a high proportion of the population was Hispanic and added a paper questionnaire to the initial mailing for a subsample with rural and older families. In the main-survey stage, the adaptive design targeted the households where a member other than the screening respondent was selected for the survey. The adaptations included emailing and/or texting, an additional prepaid incentive, and seeking screening respondents’ help to remind the selected individuals. The main research questions are (i) whether the adaptive design improved survey production outcomes and (ii) whether combining adaptive design and postsurvey weighting adjustments improved survey estimates compared to performing postsurvey adjustments alone. Unfortunately, the adaptive designs did not improve the survey production outcomes. We found that the weighted AFHS estimates closely resemble those of a benchmark national face-to-face survey, the National Survey of Family Growth, although the adaptive design did not additionally change survey estimates beyond the weighting adjustments. Nonetheless, our experiment yields useful insights about the implementation of adaptive design in a self-administered mail-recruit web or mail survey. We were able to identify subgroups with potentially lower response rates and distinctive characteristics, but it was challenging to develop effective protocol adaptations for these subgroups under the constraints of the two primary survey modes and the operational budget of the AFHS. In addition, for self-administered within-household selection, it was difficult to obtain contact information from, reach, and recruit selected household members that did not respond to the screening interview.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135825944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Imputation of General Data","authors":"Michael W Robbins","doi":"10.1093/jssam/smad034","DOIUrl":"https://doi.org/10.1093/jssam/smad034","url":null,"abstract":"Abstract High-dimensional complex survey data of general structures (e.g., containing continuous, binary, categorical, and ordinal variables), such as the US Department of Defense’s Health-Related Behaviors Survey (HRBS), often confound procedures designed to impute any missing survey data. Imputation by fully conditional specification (FCS) is often considered the state of the art for such datasets due to its generality and flexibility. However, FCS procedures contain a theoretical flaw that is exposed by HRBS data—HRBS imputations created with FCS are shown to diverge across iterations of Markov Chain Monte Carlo. Imputation by joint modeling lacks this flaw; however, current joint modeling procedures are neither general nor flexible enough to handle HRBS data. As such, we introduce an algorithm that efficiently and flexibly applies multiple imputation by joint modeling in data of general structures. This procedure draws imputations from a latent joint multivariate normal model that underpins the generally structured data and models the latent data via a sequence of conditional linear models, the predictors of which can be specified by the user. We perform rigorous evaluations of HRBS imputations created with the new algorithm and show that they are convergent and of high quality. Lastly, simulations verify that the proposed method performs well compared to existing algorithms including FCS.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135878956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James Wagner, Lena Centeno, Richard Dulaney, Brad Edwards, Z Tuba Suzer-Gurtekin, Stephanie Coffey
{"title":"Proxy Survey Cost Indicators in Interviewer-Administered Surveys: Are they Actually Correlated with Costs?","authors":"James Wagner, Lena Centeno, Richard Dulaney, Brad Edwards, Z Tuba Suzer-Gurtekin, Stephanie Coffey","doi":"10.1093/jssam/smad028","DOIUrl":"https://doi.org/10.1093/jssam/smad028","url":null,"abstract":"Abstract Survey design decisions are—by their very nature—tradeoffs between costs and errors. However, measuring costs is often difficult. Furthermore, surveys are growing more complex. Many surveys require that cost information be available to make decisions during data collection. These complexities create new challenges for monitoring and understanding survey costs. Often, survey cost information lags behind reporting of paradata. Furthermore, in some situations, the measurement of costs at the case level is difficult. Given the time lag in reporting cost information and the difficulty of assigning costs directly to cases, survey designers and managers have frequently turned to proxy indicators for cost. These proxy measures are often based upon level-of-effort paradata. An example of such a proxy cost indicator is the number of attempts per interview. Unfortunately, little is known about how accurately these proxy indicators actually mirror the true costs of the survey. In this article, we examine a set of these proxy indicators across several surveys with different designs, including different modes of interview. We examine the strength of correlation between these indicators and two different measures of costs—the total project cost and total interviewer hours. This article provides some initial evidence about the quality of these proxies as surrogates for the true costs using data from several different surveys with interviewer-administered modes (telephone, face to face) across three organizations (University of Michigan’s Survey Research Center, Westat, US Census Bureau). We find that some indicators (total attempts, total contacts, total completes, sample size) are correlated (average correlation ∼0.60) with total costs across several surveys. These same indicators are strongly correlated (average correlation ∼0.82) with total interviewer hours. For survey components, three indicators (total attempts, sample size, and total miles) are strongly correlated with both total costs (average correlation ∼0.77) and with total interviewer hours (average correlation ∼0.86).","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136081761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Total Bias in Income Surveys when Nonresponse and Measurement Errors are Correlated","authors":"Andrea Neri, Eleonora Porreca","doi":"10.1093/jssam/smad027","DOIUrl":"https://doi.org/10.1093/jssam/smad027","url":null,"abstract":"Abstract Household surveys on income might suffer from quality limitations mainly due to the difficulty of enrolling households (unit nonresponse) and retrieving correct information during the interview (measurement error [ME]). These errors are likely to be correlated because of latent factors, such as the threat of disclosing personal information, the perceived sensitivity of the topic, or social desirability. For survey organizations, assessing the interplay of these errors and their impact on the accuracy and precision of inferences derived from their data is crucial. In this article, we propose to use a standard sample selection model within a total survey error framework to deal with the case of correlated nonresponse error (NR) and ME in estimating average household income. We use it to study the correlation between the two errors, quantify the ME component due to this correlation, and evaluate ME among nonrespondents. Using the Italian Survey on Income and Wealth linked with administrative income data from tax returns, we find a positive correlation between the two errors and that households at the extremes of the income distribution mainly cause this association. Our results show that ME contributes more to the total error than unit nonresponse and that it would be larger in absence of the correlation between the two errors. Finally, efforts to reduce nonresponse rates are worthwhile only for nonrespondents in the lowest estimated response propensity group. If these households participate, the bias decreases because of the reduction in NR that offsets the increase in ME.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136242795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sophia Waldmann, Joseph W Sakshaug, Alexandru Cernat
{"title":"Interviewer Effects on the Measurement of Physical Performance in a Cross-National Biosocial Survey.","authors":"Sophia Waldmann, Joseph W Sakshaug, Alexandru Cernat","doi":"10.1093/jssam/smad031","DOIUrl":"10.1093/jssam/smad031","url":null,"abstract":"<p><p>Biosocial surveys increasingly use interviewers to collect objective physical health measures (or \"biomeasures\") in respondents' homes. While interviewers play an important role, their high involvement can lead to unintended interviewer effects on the collected measurements. Such interviewer effects add uncertainty to population estimates and have the potential to lead to erroneous inferences. This study examines interviewer effects on the measurement of physical performance in a cross-national and longitudinal setting using data from the Survey of Health, Ageing and Retirement in Europe. The analyzed biomeasures exhibited moderate-to-large interviewer effects on the measurements, which varied across biomeasure types and across countries. Our findings demonstrate the necessity to better understand the origin of interviewer-related measurement errors in biomeasure collection and account for these errors in statistical analyses of biomeasure data.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11361789/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43721665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Auxiliary Marginal Distributions in Imputations for Nonresponse while Accounting for Survey Weights, with Application to Estimating Voter Turnout","authors":"Jiurui Tang, D Sunshine Hillygus, Jerome P Reiter","doi":"10.1093/jssam/smad033","DOIUrl":"https://doi.org/10.1093/jssam/smad033","url":null,"abstract":"Abstract In many survey settings, population counts or percentages are available for some of the variables in the survey, for example, from censuses, administrative databases, or other high-quality surveys. We present a model-based approach to utilize such auxiliary marginal distributions in multiple imputation for unit and item nonresponse in complex surveys. In doing so, we ensure that the imputations produce design-based estimates that are plausible given the known margins. We introduce and utilize a hybrid missingness model comprising a pattern mixture model for unit nonresponse and selection models for item nonresponse. We also develop a computational strategy for estimating the parameters of and generating imputations with hybrid missingness models. We apply a hybrid missingness model to examine voter turnout by subgroups using the 2018 Current Population Survey for North Carolina. The hybrid missingness model also facilitates modeling measurement errors simultaneously with handling missing values. We illustrate this feature with the voter turnout application by examining how results change when we allow for overreporting, that is, individuals self-reporting that they voted when in fact they did not.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136336393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable Inclusion Strategies for Effective Quota Sampling and Propensity Modeling: An Application to SARS-COV-2 Infection Prevalence Estimation","authors":"Yan Li, M. Fay, Sally A. Hunsberger, B. Graubard","doi":"10.1093/jssam/smad026","DOIUrl":"https://doi.org/10.1093/jssam/smad026","url":null,"abstract":"\u0000 Public health policymakers must make crucial decisions rapidly during a pandemic. In such situations, accurate measurements from health surveys are essential. As a consequence of limited time and resource constraints, it may be infeasible to implement a probability-based sample that yields high response rates. An alternative approach is to select a quota sample from a large pool of volunteers, with the quota sample selection based on the census distributions of available—often demographic—variables, also known as quota variables. In practice, however, census data may only contain a subset of the required predictor variables. Thus, the realized quota sample can be adjusted by propensity score pseudoweighting using a “reference” probability-based survey that contains more predictor variables. Motivated by the SARS-CoV-2 serosurvey (a quota sample conducted in 2020 by the National Institutes of Health), we identify the condition under which the quota variables can be ignored in constructing the propensity model but still produce nearly unbiased estimation of population means. We conduct limited simulations to evaluate the bias and variance reduction properties of alternative weighting strategies for quota sample estimates under three propensity models that account for varying sets of predictors and degrees of correlation among the predictor sets and then apply our findings to the empirical data.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46070161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}