{"title":"Discussion of the 2022 Hansen Lecture: “The Evolution of the Use of Models in Survey Sampling”","authors":"F. Breidt","doi":"10.1093/jssam/smad030","DOIUrl":"https://doi.org/10.1093/jssam/smad030","url":null,"abstract":"\u0000 The 2022 Hansen Lecture gave a broad overview of the use of models in survey sampling, with emphasis on modeling approaches to incorporating auxiliary information in survey estimators. This discussion expands upon some issues in model-assisted estimation, exploring data needs and the availability of multipurpose weights for advanced modeling methods.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48827563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura J Gamble, L. Johnston, P. Pham, P. Vinck, Katherine R. McLaughlin
{"title":"Estimating the Size of Clustered Hidden Populations","authors":"Laura J Gamble, L. Johnston, P. Pham, P. Vinck, Katherine R. McLaughlin","doi":"10.1093/jssam/smad025","DOIUrl":"https://doi.org/10.1093/jssam/smad025","url":null,"abstract":"\u0000 Successive sampling population size estimation (SS-PSE) is a method used by government agencies, aid organizations, and researchers around the world to estimate the size of hidden populations using data from respondent-driven sampling surveys. SS-PSE addresses a specific need in estimation, since many countries rely on having accurate size estimates to plan and allocate finite resources to address the needs of hidden populations. However, SS-PSE relies on several assumptions, one of which requires the underlying social network of the hidden population to be fully connected. We propose two modifications to SS-PSE for estimating the size of hidden populations whose underlying social network is composed of disjoint clusters. The first method is a theoretically straightforward extension of SS-PSE, but it relies on prior information that may be difficult to obtain in practice. The second method extends the Bayesian SS-PSE model by introducing a new set of parameters that allow for clustered estimation without requiring the additional prior information. After providing theoretical justification for both novel methods, we then assess their performance using simulations and apply the Clustered SS-PSE method to a population of internally displaced persons in Bamako, Mali.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48480602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multivariate Small-area Estimation for Mixed-type Response Variables With Item Nonresponse","authors":"Haoliang Sun, Emily J. Berg, Zhengyuan Zhu","doi":"10.1093/jssam/smad018","DOIUrl":"https://doi.org/10.1093/jssam/smad018","url":null,"abstract":"\u0000 Many surveys collect information on discrete characteristics and continuous variables, that is, mixed-type variables. Small-area statistics of interest include means or proportions of the response variables as well as their domain means, which are the mean values at each level of a different categorical variable. However, item nonresponse in survey data increases the complexity of small-area estimation. To address this issue, we propose a multivariate mixed-effects model for mixed-type response variables subject to item nonresponse. We apply this method to two data structures where the data are missing completely at random by design. We use empirical data from two separate studies: a survey of pet owners and a dataset from the National Resources Inventory. In these applications, our proposed method leads to improvements relative to a direct estimator and a predictor based on a univariate model.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49177498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hansen Lecture 2022: The Evolution of the Use of Models in Survey Sampling","authors":"Richard Valliant","doi":"10.1093/jssam/smad021","DOIUrl":"https://doi.org/10.1093/jssam/smad021","url":null,"abstract":"Abstract Morris Hansen made seminal contributions to the early development of sampling theory, including convincing government survey administrators to use probability sampling as opposed to nonprobability (NP) methods like quota sampling. He codified many of the early results in design-based sampling theory in his 1953 two-volume set co-authored with Hurwitz and Madow. Since those developments, the explicit use of models has proliferated in sampling for use in basic point estimation, nonresponse and noncoverage adjustment, imputation, and a variety of other areas. This paper summarizes some of the early developments, controversies in the design-based versus model-based debate, and uses of models for inference from probability and NP samples.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135259876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Jäckle, Jonathan Burton, M. Couper, Thomas F. Crossley, Sandra Walzenbach
{"title":"Survey Consent to Administrative Data Linkage: Five Experiments on Wording and Format","authors":"A. Jäckle, Jonathan Burton, M. Couper, Thomas F. Crossley, Sandra Walzenbach","doi":"10.1093/jssam/smad019","DOIUrl":"https://doi.org/10.1093/jssam/smad019","url":null,"abstract":"\u0000 To maximize the value of the data while minimizing respondent burden, survey data are increasingly linked to administrative records. Record linkage often requires the informed consent of survey respondents and failure to obtain consent reduces sample size and may lead to selection bias. Relatively little is known about how best to word and format consent requests in surveys. We conducted a series of experiments in a probability household panel and an online access panel to understand how various features of the design of the consent request can affect informed consent. We experimentally varied: (i) the readability of the consent request, (ii) placement of the consent request in the survey, (iii) consent as default versus the standard opt-in consent question, (iv) offering additional information, and (v) a priming treatment focusing on trust in the data holder. For each experiment, we examine the effects of the treatments on consent rates, objective understanding of the consent request (measured with knowledge test questions), subjective understanding (how well the respondent felt they understood the request), confidence in their decision, response times, and whether they read any of the additional information materials. We find that the default wording and offering additional information do not increase consent rates. Improving the readability of the consent question increases objective understanding but does not increase the consent rate. However, asking for consent early in the survey and priming respondents to consider their trust in the administrative data holder both increase consent rates without negatively affecting understanding of the request.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44943344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pseudo-Bayesian Small-Area Estimation","authors":"G. Datta, Juhyung Lee, Jiacheng Li","doi":"10.1093/jssam/smad012","DOIUrl":"https://doi.org/10.1093/jssam/smad012","url":null,"abstract":"\u0000 In sample surveys, a subpopulation is referred to as a “small area” or “small domain” if it does not have a large enough sample that alone will yield an adequately accurate estimate of a characteristic. In small-area estimation, the sample size from various subpopulations is often too small to accurately estimate its mean, and so one borrows strength from similar subpopulations through an appropriate model based on relevant covariates. The empirical best linear unbiased prediction (EBLUP) method has been the dominant frequentist model-based approach in small-area estimation. This method relies on estimation of model parameters based on the marginal distribution of the data. As an alternative to this method, the observed best prediction (OBP) method estimates the parameters by minimizing an objective function that is implied by the total mean squared prediction error. We use this objective function in the Fay–Herriot model to construct a pseudo-posterior distribution for the model parameters under nearly noninformative priors for them. Data analysis and simulation show that the pseudo-Bayesian estimators (PBEs) compete favorably with the OBPs and EBLUPs. The PBE estimates are robust to mean misspecification and have good frequentist properties. Being Bayesian by construction, they automatically avoid negative estimates of standard errors, enjoy a dual justification, and provide an attractive alternative to practitioners.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48464247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximum Entropy Design by a Markov Chain Process","authors":"Yves Tillé, Bardia Panahbehagh","doi":"10.1093/jssam/smad010","DOIUrl":"https://doi.org/10.1093/jssam/smad010","url":null,"abstract":"Abstract In this article, we study an implementation of maximum entropy (ME) design utilizing a Markov chain. This design, which is also called the conditional Poisson sampling design, is difficult to implement. We first present a new method for calculating the weights associated with conditional Poisson sampling. Then, we study a very simple method of random exchanges of units, which allows switching from one sample to another. This exchange system defines an irreducible and aperiodic Markov chain whose ME design is the stationary distribution. The design can be implemented without enumerating all possible samples. By repeating the exchange process a large number of times, it is possible to select a sample that respects the design. The process is simple to implement, and its convergence rate has been investigated theoretically and by simulation, which led to promising results.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135961420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Overview of Unit-Level Modeling of Survey Data for Small Area Estimation Under Informative Sampling","authors":"Paul A Parker, Ryan Janicki, Scott H Holan","doi":"10.1093/jssam/smad020","DOIUrl":"https://doi.org/10.1093/jssam/smad020","url":null,"abstract":"Abstract Model-based small area estimation is frequently used in conjunction with survey data to establish estimates for under-sampled or unsampled geographies. These models can be specified at either the area-level, or the unit-level, but unit-level models often offer potential advantages such as more precise estimates and easy spatial aggregation. Nevertheless, relative to area-level models, literature on unit-level models is less prevalent. In modeling small areas at the unit level, challenges often arise as a consequence of the informative sampling mechanism used to collect the survey data. This article provides a comprehensive methodological review for unit-level models under informative sampling, with an emphasis on Bayesian approaches.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135860164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of Unit-Level Small Area Estimation Modeling Approaches for Survey Data Under Informative Sampling","authors":"Paul A Parker, Ryan Janicki, Scott H Holan","doi":"10.1093/jssam/smad022","DOIUrl":"https://doi.org/10.1093/jssam/smad022","url":null,"abstract":"Abstract Unit-level modeling strategies offer many advantages relative to the area-level models that are most often used in the context of small area estimation. For example, unit-level models aggregate naturally, allowing for estimates at any desired resolution, and also offer greater precision in many cases. We compare a variety of the methods available in the literature related to unit-level modeling for small area estimation. Specifically, to provide insight into the differences between methods, we conduct a simulation study that compares several of the general approaches. In addition, the methods used for simulation are further illustrated through an application to the American Community Survey.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135859749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Dutwin, Patrick Coyle, I. Bilgen, N. English
{"title":"Leveraging Predictive Modelling from Multiple Sources of Big Data to Improve Sample Efficiency and Reduce Survey Nonresponse Error","authors":"David Dutwin, Patrick Coyle, I. Bilgen, N. English","doi":"10.1093/jssam/smad016","DOIUrl":"https://doi.org/10.1093/jssam/smad016","url":null,"abstract":"\u0000 Big data has been fruitfully leveraged as a supplement for survey data—and sometimes as its replacement—and in the best of worlds, as a “force multiplier” to improve survey analytics and insight. We detail a use case, the big data classifier (BDC), as a replacement to the more traditional methods of targeting households in survey sampling for given specific household and personal attributes. Much like geographic targeting and the use of commercial vendor flags, we detail the ability of BDCs to predict the likelihood that any given household is, for example, one that contains a child or someone who is Hispanic. We specifically build 15 BDCs with the combined data from a large nationally representative probability-based panel and a range of big data from public and private sources, and then assess the effectiveness of these BDCs to successfully predict their range of predicted attributes across three large survey datasets. For each BDC and each data application, we compare the relative effectiveness of the BDCs against historical sample targeting techniques of geographic clustering and vendor flags. Overall, BDCs offer a modest improvement in their ability to target subpopulations. We find classes of predictions that are consistently more effective, and others where the BDCs are on par with vendor flagging, though always superior to geographic clustering. We present some of the relative strengths and weaknesses of BDCs as a new method to identify and subsequently sample low incidence and other populations.","PeriodicalId":17146,"journal":{"name":"Journal of Survey Statistics and Methodology","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45867293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}