{"title":"Hansen Lecture 2022: The Evolution of the Use of Models in Survey Sampling","authors":"Richard Valliant","doi":"10.1093/jssam/smad021","DOIUrl":"https://doi.org/10.1093/jssam/smad021","url":null,"abstract":"Abstract Morris Hansen made seminal contributions to the early development of sampling theory, including convincing government survey administrators to use probability sampling as opposed to nonprobability (NP) methods like quota sampling. He codified many of the early results in design-based sampling theory in his 1953 two-volume set co-authored with Hurwitz and Madow. Since those developments, the explicit use of models has proliferated in sampling for use in basic point estimation, nonresponse and noncoverage adjustment, imputation, and a variety of other areas. This paper summarizes some of the early developments, controversies in the design-based versus model-based debate, and uses of models for inference from probability and NP samples.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135259876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Jäckle, Jonathan Burton, M. Couper, Thomas F. Crossley, Sandra Walzenbach
{"title":"Survey Consent to Administrative Data Linkage: Five Experiments on Wording and Format","authors":"A. Jäckle, Jonathan Burton, M. Couper, Thomas F. Crossley, Sandra Walzenbach","doi":"10.1093/jssam/smad019","DOIUrl":"https://doi.org/10.1093/jssam/smad019","url":null,"abstract":"\u0000 To maximize the value of the data while minimizing respondent burden, survey data are increasingly linked to administrative records. Record linkage often requires the informed consent of survey respondents and failure to obtain consent reduces sample size and may lead to selection bias. Relatively little is known about how best to word and format consent requests in surveys. We conducted a series of experiments in a probability household panel and an online access panel to understand how various features of the design of the consent request can affect informed consent. We experimentally varied: (i) the readability of the consent request, (ii) placement of the consent request in the survey, (iii) consent as default versus the standard opt-in consent question, (iv) offering additional information, and (v) a priming treatment focusing on trust in the data holder. For each experiment, we examine the effects of the treatments on consent rates, objective understanding of the consent request (measured with knowledge test questions), subjective understanding (how well the respondent felt they understood the request), confidence in their decision, response times, and whether they read any of the additional information materials. We find that the default wording and offering additional information do not increase consent rates. Improving the readability of the consent question increases objective understanding but does not increase the consent rate. However, asking for consent early in the survey and priming respondents to consider their trust in the administrative data holder both increase consent rates without negatively affecting understanding of the request.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44943344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pseudo-Bayesian Small-Area Estimation","authors":"G. Datta, Juhyung Lee, Jiacheng Li","doi":"10.1093/jssam/smad012","DOIUrl":"https://doi.org/10.1093/jssam/smad012","url":null,"abstract":"\u0000 In sample surveys, a subpopulation is referred to as a “small area” or “small domain” if it does not have a large enough sample that alone will yield an adequately accurate estimate of a characteristic. In small-area estimation, the sample size from various subpopulations is often too small to accurately estimate its mean, and so one borrows strength from similar subpopulations through an appropriate model based on relevant covariates. The empirical best linear unbiased prediction (EBLUP) method has been the dominant frequentist model-based approach in small-area estimation. This method relies on estimation of model parameters based on the marginal distribution of the data. As an alternative to this method, the observed best prediction (OBP) method estimates the parameters by minimizing an objective function that is implied by the total mean squared prediction error. We use this objective function in the Fay–Herriot model to construct a pseudo-posterior distribution for the model parameters under nearly noninformative priors for them. Data analysis and simulation show that the pseudo-Bayesian estimators (PBEs) compete favorably with the OBPs and EBLUPs. The PBE estimates are robust to mean misspecification and have good frequentist properties. Being Bayesian by construction, they automatically avoid negative estimates of standard errors, enjoy a dual justification, and provide an attractive alternative to practitioners.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48464247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximum Entropy Design by a Markov Chain Process","authors":"Yves Tillé, Bardia Panahbehagh","doi":"10.1093/jssam/smad010","DOIUrl":"https://doi.org/10.1093/jssam/smad010","url":null,"abstract":"Abstract In this article, we study an implementation of maximum entropy (ME) design utilizing a Markov chain. This design, which is also called the conditional Poisson sampling design, is difficult to implement. We first present a new method for calculating the weights associated with conditional Poisson sampling. Then, we study a very simple method of random exchanges of units, which allows switching from one sample to another. This exchange system defines an irreducible and aperiodic Markov chain whose ME design is the stationary distribution. The design can be implemented without enumerating all possible samples. By repeating the exchange process a large number of times, it is possible to select a sample that respects the design. The process is simple to implement, and its convergence rate has been investigated theoretically and by simulation, which led to promising results.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135961420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Overview of Unit-Level Modeling of Survey Data for Small Area Estimation Under Informative Sampling","authors":"Paul A Parker, Ryan Janicki, Scott H Holan","doi":"10.1093/jssam/smad020","DOIUrl":"https://doi.org/10.1093/jssam/smad020","url":null,"abstract":"Abstract Model-based small area estimation is frequently used in conjunction with survey data to establish estimates for under-sampled or unsampled geographies. These models can be specified at either the area-level, or the unit-level, but unit-level models often offer potential advantages such as more precise estimates and easy spatial aggregation. Nevertheless, relative to area-level models, literature on unit-level models is less prevalent. In modeling small areas at the unit level, challenges often arise as a consequence of the informative sampling mechanism used to collect the survey data. This article provides a comprehensive methodological review for unit-level models under informative sampling, with an emphasis on Bayesian approaches.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135860164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of Unit-Level Small Area Estimation Modeling Approaches for Survey Data Under Informative Sampling","authors":"Paul A Parker, Ryan Janicki, Scott H Holan","doi":"10.1093/jssam/smad022","DOIUrl":"https://doi.org/10.1093/jssam/smad022","url":null,"abstract":"Abstract Unit-level modeling strategies offer many advantages relative to the area-level models that are most often used in the context of small area estimation. For example, unit-level models aggregate naturally, allowing for estimates at any desired resolution, and also offer greater precision in many cases. We compare a variety of the methods available in the literature related to unit-level modeling for small area estimation. Specifically, to provide insight into the differences between methods, we conduct a simulation study that compares several of the general approaches. In addition, the methods used for simulation are further illustrated through an application to the American Community Survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135859749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Dutwin, Patrick Coyle, I. Bilgen, N. English
{"title":"Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error","authors":"David Dutwin, Patrick Coyle, I. Bilgen, N. English","doi":"10.1093/jssam/smad016","DOIUrl":"https://doi.org/10.1093/jssam/smad016","url":null,"abstract":"\u0000 Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45867293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Primer on the Data Cleaning Pipeline","authors":"Rebecca C Steorts","doi":"10.1093/jssam/smad017","DOIUrl":"https://doi.org/10.1093/jssam/smad017","url":null,"abstract":"Abstract The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this expansion, the statistical and methodological questions around data integration, or rather merging multiple data sources, have also grown. Specifically, the science of the “data cleaning pipeline” contains four stages that allow an analyst to perform downstream tasks, predictive analyses, or statistical analyses on “cleaned data.” This article provides a review of this emerging field, introducing technical terminology and commonly used methods.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135194364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Improving Statistical Matching when Auxiliary Information is Available","authors":"","doi":"10.1093/jssam/smad023","DOIUrl":"https://doi.org/10.1093/jssam/smad023","url":null,"abstract":"","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135540950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chloe Howard, Lara M. Greaves, D. Osborne, C. Sibley
{"title":"Is there a Day of the Week Effect on Panel Response Rate to an Online Questionnaire Email Invitation?","authors":"Chloe Howard, Lara M. Greaves, D. Osborne, C. Sibley","doi":"10.1093/jssam/smad014","DOIUrl":"https://doi.org/10.1093/jssam/smad014","url":null,"abstract":"\u0000 Does the day of the week an email is sent inviting existing participants to complete a follow-up questionnaire for an annual online survey impact response rate? We answer this question using a preregistered experiment conducted as part of an ongoing national probability panel study in New Zealand. Across 14 consecutive days, existing participants in a panel study were randomly allocated a day of the week to receive an email inviting them to complete the next wave of the questionnaire online (N = 26,126). Valid responses included questionnaires completed within 31 days of receiving the initial invitation. Results revealed that the day the invitation was sent did not affect the likelihood of responding. These results are reassuring for researchers conducting ongoing panel studies and suggest that, once participants have joined a panel, the day of the week they are contacted does not impact their likelihood of responding to subsequent waves.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46873457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}