Lukas Olbrich, Yuliya Kosyakova, J. Sakshaug, Silvia Schwanhäuser
{"title":"Detecting Interviewer Fraud Using Multilevel Models","authors":"Lukas Olbrich, Yuliya Kosyakova, J. Sakshaug, Silvia Schwanhäuser","doi":"10.1093/jssam/smac036","DOIUrl":"https://doi.org/10.1093/jssam/smac036","url":null,"abstract":"\u0000 Interviewer falsification, such as the complete or partial fabrication of interview data, has been shown to substantially affect the results of survey data. In this study, we apply a method to identify falsifying face-to-face interviewers based on the development of their behavior over the survey field period. We postulate four potential falsifier types: steady low-effort falsifiers, steady high-effort falsifiers, learning falsifiers, and sudden falsifiers. Using large-scale survey data from Germany with verified falsifications, we apply multilevel models with interviewer effects on the intercept, scale, and slope of the interview sequence to test whether falsifiers can be detected based on their dynamic behavior. In addition to identifying a rather high-effort falsifier previously detected by the survey organization, the model flagged two additional suspicious interviewers exhibiting learning behavior, who were subsequently classified as deviant by the survey organization. We additionally apply the analysis approach to publicly available cross-national survey data and find multiple interviewers who show behavior consistent with the postulated falsifier types.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42430170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Randal ZuWallack, Matt Jans, Thomas Brassell, Kisha Bailly, James Dayton, Priscilla Martinez, Deidre Patterson, Thomas K Greenfield, Katherine J Karriker-Jaffe
{"title":"Estimating Web Survey Mode and Panel Effects in a Nationwide Survey of Alcohol Use.","authors":"Randal ZuWallack, Matt Jans, Thomas Brassell, Kisha Bailly, James Dayton, Priscilla Martinez, Deidre Patterson, Thomas K Greenfield, Katherine J Karriker-Jaffe","doi":"10.1093/jssam/smac028","DOIUrl":"https://doi.org/10.1093/jssam/smac028","url":null,"abstract":"<p><p>Random-digit dialing (RDD) telephone surveys are challenged by declining response rates and increasing costs. Many surveys that were traditionally conducted via telephone are seeking cost-effective alternatives, such as address-based sampling (ABS) with self-administered web or mail questionnaires. At a fraction of the cost of both telephone and ABS surveys, opt-in web panels are an attractive alternative. The 2019-2020 National Alcohol Survey (NAS) employed three methods: (1) an RDD telephone survey (traditional NAS method); (2) an ABS push-to-web survey; and (3) an opt-in web panel. The study reported here evaluated differences in the three data-collection methods, which we will refer to as \"mode effects,\" on alcohol consumption and health topics. To evaluate mode effects, multivariate regression models were developed predicting these characteristics, and the presence of a mode effect on each outcome was determined by the significance of the three-level effect (RDD-telephone, ABS-web, opt-in web panel) in each model. Those results were then used to adjust for mode effects and produce a \"telephone-equivalent\" estimate for the ABS and panel data sources. The study found that ABS-web and RDD were similar for most estimates but exhibited differences for sensitive questions including getting drunk and experiencing depression. The opt-in web panel exhibited more differences between it and the other two survey modes. One notable example is the reporting of drinking alcohol at least 3-4 times per week, which was 21 percent for RDD-phone, 24 percent for ABS-web, and 34 percent for opt-in web panel. The regression model adjusts for mode effects, improving comparability with past surveys conducted by telephone; however, the models result in higher variance of the estimates. This method of adjusting for mode effects has broad applications to mode and sample transitions throughout the survey research industry.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"11 5","pages":"1089-1109"},"PeriodicalIF":2.1,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138460650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert H Lyles, Yuzi Zhang, Lin Ge, Cameron England, Kevin Ward, Timothy L Lash, Lance A Waller
{"title":"Using Capture-Recapture Methodology to Enhance Precision of Representative Sampling-Based Case Count Estimates.","authors":"Robert H Lyles, Yuzi Zhang, Lin Ge, Cameron England, Kevin Ward, Timothy L Lash, Lance A Waller","doi":"10.1093/jssam/smab052","DOIUrl":"https://doi.org/10.1093/jssam/smab052","url":null,"abstract":"<p><p>The application of serial principled sampling designs for diagnostic testing is often viewed as an ideal approach to monitoring prevalence and case counts of infectious or chronic diseases. Considering logistics and the need for timeliness and conservation of resources, surveillance efforts can generally benefit from creative designs and accompanying statistical methods to improve the precision of sampling-based estimates and reduce the size of the necessary sample. One option is to augment the analysis with available data from other surveillance streams that identify cases from the population of interest over the same timeframe, but may do so in a highly nonrepresentative manner. We consider monitoring a closed population (e.g., a long-term care facility, patient registry, or community), and encourage the use of capture-recapture methodology to produce an alternative case total estimate to the one obtained by principled sampling. With care in its implementation, even a relatively small simple or stratified random sample not only provides its own valid estimate, but provides the only fully defensible means of justifying a second estimate based on classical capture-recapture methods. We initially propose weighted averaging of the two estimators to achieve greater precision than can be obtained using either alone, and then show how a novel single capture-recapture estimator provides a unified and preferable alternative. We develop a variant on a Dirichlet-multinomial-based credible interval to accompany our hybrid design-based case count estimates, with a view toward improved coverage properties. Finally, we demonstrate the benefits of the approach through simulations designed to mimic an acute infectious disease daily monitoring program or an annual surveillance program to quantify new cases within a fixed patient registry.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"10 5","pages":"1292-1318"},"PeriodicalIF":2.1,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643167/pdf/smab052.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9785848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Simple Question Goes a Long Way: A Wording Experiment on Bank Account Ownership.","authors":"Marco Angrisani, Mick P Couper","doi":"10.1093/jssam/smab045","DOIUrl":"https://doi.org/10.1093/jssam/smab045","url":null,"abstract":"<p><p>Ownership of a bank account is an objective measure and should be relatively easy to elicit via survey questions. Yet, depending on the interview mode, the wording of the question and its placement within the survey may influence respondents' answers. The Health and Retirement Study (HRS) asset module, as administered online to members of the Understanding America Study (UAS), yielded substantially lower rates of reported bank account ownership than either a single question on ownership in the Current Population Survey (CPS) or the full asset module administered to HRS panelists (both interviewer-administered surveys). We designed and implemented an experiment in the UAS comparing the original HRS question eliciting bank account ownership with two alternative versions that were progressively simplified. We document strong evidence that the original question leads to systematic underestimation of bank account ownership. In contrast, the proportion of bank account owners obtained from the simplest alternative version of the question is very similar to the population benchmark estimate. We investigate treatment effect heterogeneity by cognitive ability and financial literacy. We find that questionnaire simplification affects responses of individuals with higher cognitive ability substantially less than those with lower cognitive ability. Our results suggest that high-quality data from surveys start from asking the right questions, which should be as simple and precise as possible and carefully adapted to the mode of interview.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"10 5","pages":"1172-1182"},"PeriodicalIF":2.1,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643168/pdf/smab045.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10370660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empirical Best Prediction of Small Area Means Based on a Unit-Level Gamma-Poisson Model","authors":"Emily J. Berg","doi":"10.1093/jssam/smac026","DOIUrl":"https://doi.org/10.1093/jssam/smac026","url":null,"abstract":"\u0000 Existing small area estimation procedures for count data have important limitations. For instance, an M-quantile-based method is known to be less efficient than model-based procedures if the assumptions of the model hold. Also, frequentist inference procedures for Poisson generalized linear mixed models can be computationally intensive or require approximations. Furthermore, area-level models are incapable of incorporating unit-level covariates. We overcome these limitations by developing a small area estimation procedure for a unit-level gamma-Poisson model. The conjugate form of the model permits computationally simple estimation and prediction procedures. We obtain a closed-form expression for the empirical best predictor of the mean as well as a closed-form mean square error estimator. We validate the procedure through simulations. We illustrate the proposed method using a subset of data from the Iowa Seat-Belt Use survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"1 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41515091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated Classification for Open-Ended Questions with BERT","authors":"Hyukjun Gweon, Matthias Schonlau","doi":"10.1093/jssam/smad015","DOIUrl":"https://doi.org/10.1093/jssam/smad015","url":null,"abstract":"\u0000 Manual coding of text data from open-ended questions into different categories is time consuming and expensive. Automated coding uses statistical/machine learning to train on a small subset of manually-coded text answers. Recently, pretraining a general language model on vast amounts of unrelated data and then adapting the model to the specific application has proven effective in natural language processing. Using two data sets, we empirically investigate whether BERT, the currently dominant pretrained language model, is more effective at automated coding of answers to open-ended questions than other non-pretrained statistical learning approaches. We found fine-tuning the pretrained BERT parameters is essential as otherwise BERT is not competitive. Second, we found fine-tuned BERT barely beats the non-pretrained statistical learning approaches in terms of classification accuracy when trained on 100 manually coded observations. However, BERT’s relative advantage increases rapidly when more manually coded observations (e.g., 200–400) are available for training. We conclude that for automatically coding answers to open-ended questions BERT is preferable to non-pretrained models such as support vector machines and boosting.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46968013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Group-Specific Interviewer Effects on Survey Participation Using Separate Coding for Random Slopes in Multilevel Models","authors":"J. Herzing, A. Blom, B. Meuleman","doi":"10.1093/jssam/smac025","DOIUrl":"https://doi.org/10.1093/jssam/smac025","url":null,"abstract":"\u0000 Despite its importance in terms of survey participation, the literature is sparse on how face-to-face interviewers differentially affect specific groups of sample units. This paper demonstrates how an alternative parametrization of the random components in multilevel models, so-called separate coding, delivers valuable insights into differential interviewer effects for specific groups of sample members. In the example of a face-to-face recruitment interview for a probability-based online panel, we detect small interviewer effects regarding survey participation for non-Internet households, whereas we find sizable interviewer effects for Internet households. We derive practical guidance for survey practitioners to address differential interviewer effects based on the proposed variance decomposition.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43011963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James Wagner, Brady T West, Mick P Couper, Shiyu Zhang, Rebecca Gatward, Raphael Nishimura, Htay-Wah Saw
{"title":"An Experimental Evaluation of Two Approaches for Improving Response to Household Screening Efforts in National Mail/Web Surveys.","authors":"James Wagner, Brady T West, Mick P Couper, Shiyu Zhang, Rebecca Gatward, Raphael Nishimura, Htay-Wah Saw","doi":"10.1093/jssam/smac024","DOIUrl":"10.1093/jssam/smac024","url":null,"abstract":"<p><p>Survey researchers have carefully modified their data collection operations for various reasons, including the rising costs of data collection and the ongoing Coronavirus disease (COVID-19) pandemic, both of which have made in-person interviewing difficult. For large national surveys that require household (HH) screening to determine survey eligibility, cost-efficient screening methods that do not include in-person visits need additional evaluation and testing. A new study, known as the American Family Health Study (AFHS), recently initiated data collection with a national probability sample, using a sequential mixed-mode mail/web protocol for push-to-web US HH screening (targeting persons aged 18-49 years). To better understand optimal approaches for this type of national screening effort, we embedded two randomized experiments in the AFHS data collection. The first tested the use of bilingual respondent materials where mailed invitations to the screener were sent in both English and Spanish to 50 percent of addresses with a high predicted likelihood of having a Spanish speaker and 10 percent of all other addresses. We found that the bilingual approach did not increase the response rate of high-likelihood Spanish-speaking addresses, but consistent with prior work, it increased the proportion of eligible Hispanic respondents identified among completed screeners, especially among addresses predicted to have a high likelihood of having Spanish speakers. The second tested a form of nonresponse follow-up, where a subsample of active sampled HHs that had not yet responded to the screening invitations was sent a priority mailing with a $5 incentive, adding to the $2 incentive provided for all sampled HHs in the initial screening invitation. We found this approach to be quite valuable for increasing the screening survey response rate.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"11 1","pages":"124-140"},"PeriodicalIF":2.1,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9875245/pdf/smac024.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9169546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A SEMIPARAMETRIC MULTIPLE IMPUTATION APPROACH TO FULLY SYNTHETIC DATA FOR COMPLEX SURVEYS.","authors":"Mandi Yu, Yulei He, Trivellore E Raghunathan","doi":"10.1093/jssam/smac016","DOIUrl":"10.1093/jssam/smac016","url":null,"abstract":"<p><p>Data synthesis is an effective statistical approach for reducing data disclosure risk. Generating fully synthetic data might minimize such risk, but its modeling and application can be difficult for data from large, complex surveys. This article extended the two-stage imputation to simultaneously impute item missing values and generate fully synthetic data. A new combining rule for making inferences using data generated in this manner was developed. Two semiparametric missing data imputation models were adapted to generate fully synthetic data for skewed continuous variable and sparse binary variable, respectively. The proposed approach was evaluated using simulated data and real longitudinal data from the Health and Retirement Study. The proposed approach was also compared with two existing synthesis approaches: (1) parametric regressions models as implemented in <i>IVEware</i>; and (2) nonparametric Classification and Regression Trees as implemented in <i>synthpop</i> package for R using real data. The results show that high data utility is maintained for a wide variety of descriptive and model-based statistics using the proposed strategy. The proposed strategy also performs better than existing methods for sophisticated analyses such as factor analysis.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"1 1","pages":"618-641"},"PeriodicalIF":2.1,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11044899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"61006847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brady T West, Ai Rene Ong, Frederick G Conrad, Michael F Schober, Kallan M Larsen, Andrew L Hupp
{"title":"INTERVIEWER EFFECTS IN LIVE VIDEO AND PRERECORDED VIDEO INTERVIEWING.","authors":"Brady T West, Ai Rene Ong, Frederick G Conrad, Michael F Schober, Kallan M Larsen, Andrew L Hupp","doi":"10.1093/jssam/smab040","DOIUrl":"https://doi.org/10.1093/jssam/smab040","url":null,"abstract":"<p><p>Live video (LV) communication tools (e.g., Zoom) have the potential to provide survey researchers with many of the benefits of in-person interviewing, while also greatly reducing data collection costs, given that interviewers do not need to travel and make in-person visits to sampled households. The COVID-19 pandemic has exposed the vulnerability of in-person data collection to public health crises, forcing survey researchers to explore remote data collection modes-such as LV interviewing-that seem likely to yield high-quality data without in-person interaction. Given the potential benefits of these technologies, the operational and methodological aspects of video interviewing have started to receive research attention from survey methodologists. Although it is remote, video interviewing still involves respondent-interviewer interaction that introduces the possibility of interviewer effects. No research to date has evaluated this potential threat to the quality of the data collected in video interviews. This research note presents an evaluation of interviewer effects in a recent experimental study of alternative approaches to video interviewing including both LV interviewing and the use of prerecorded videos of the same interviewers asking questions embedded in a web survey (\"prerecorded video\" interviewing). We find little evidence of significant interviewer effects when using these two approaches, which is a promising result. We also find that when interviewer effects were present, they tended to be slightly larger in the LV approach as would be expected in light of its being an interactive approach. We conclude with a discussion of the implications of these findings for future research using video interviewing.</p>","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"10 2","pages":"317-336"},"PeriodicalIF":2.1,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8690284/pdf/smab040.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9793318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}