{"title":"Improving Probabilistic Record Linkage Using Statistical Prediction Models","authors":"Angelo Moretti, N. Shlomo","doi":"10.1111/insr.12535","DOIUrl":"https://doi.org/10.1111/insr.12535","url":null,"abstract":"Record linkage brings together information from records in two or more data sources that are believed to belong to the same statistical unit based on a common set of matching variables. Matching variables, however, can appear with errors and variations and the challenge is to link statistical units that are subject to error. We provide an overview of record linkage techniques and specifically investigate the classic Fellegi and Sunter probabilistic record linkage framework to assess whether the decision rule for classifying pairs into sets of matches and non‐matches can be improved by incorporating a statistical prediction model. We also study whether the enhanced linkage rule can provide better results in terms of preserving associations between variables in the linked data file that are not used in the matching procedure. A simulation study and an application based on real data are used to evaluate the methods.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46915188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Elaboration Models with Symmetric Information Divergence.","authors":"Majid Asadi, Karthik Devarajan, Nader Ebrahimi, Ehsan S Soofi, Lauren Spirko-Burns","doi":"10.1111/insr.12499","DOIUrl":"10.1111/insr.12499","url":null,"abstract":"<p><p>Various statistical methodologies embed a probability distribution in a more flexible family of distributions. The latter is called <i>elaboration model</i>, which is constructed by choice or a formal procedure and evaluated by asymmetric measures such as the likelihood ratio and Kullback-Leibler information. The use of asymmetric measures can be problematic for this purpose. This paper introduces two formal procedures, referred to as link functions, that embed any baseline distribution with a continuous density on the real line into model elaborations. Conditions are given for the link functions to render symmetric Kullback-Leibler divergence, Rényi divergence, and phi-divergence family. The first link function elaborates quantiles of the baseline probability distribution. This approach produces continuous counterparts of the binary probability models. Examples include the Cauchy, probit, logit, Laplace, and Student-<i>t</i> links. The second link function elaborates the baseline survival function. Examples include the proportional odds and change point links. The logistic distribution is characterized as the one that satisfies the conditions for both links. An application demonstrates advantages of symmetric divergence measures for assessing the efficacy of covariates.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10193517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9509528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A joint normal-binary (probit) model","authors":"Margaux Delporte, Steffen Fieuws, Geert Molenberghs, Geert Verbeke, Simeon Situma Wanyama, Elpis Hatziagorou, Christiane De Boeck","doi":"10.1111/insr.12532","DOIUrl":"10.1111/insr.12532","url":null,"abstract":"<div>\u0000 \u0000 <p>In biomedical research, often hierarchical binary and continuous responses need to be jointly modelled. In joint generalised linear mixed models, this can be done with correlated random effects, which allows examining the association structure between the various responses and the evolution of this association over time. In addition, the effect of covariates on all outcomes can be assessed simultaneously. Still, investigating this association is often limited to examining the correlations between the responses on an underlying scale. In addition, the interpretation of this hierarchical model is conditional on the subject-specific random effects. This paper extends this approach and shows how manifest correlations can be computed, that is, the associations between the observed responses. Further, a marginal model is formulated, in which the interpretation is no longer conditional on the random effects. In addition, prediction intervals are derived of one subvector of responses conditional on the other. These methods are applied in a case study of the lung function and allergic bronchopulmonary aspergillosis in patients with cystic fibrosis.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44702563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Likelihood-Based Inference for the Finite Population Mean with Post-Stratification Information Under Non-Ignorable Non-Response","authors":"Sahar Z. Zangeneh, Roderick J. Little","doi":"10.1111/insr.12527","DOIUrl":"10.1111/insr.12527","url":null,"abstract":"<div>\u0000 \u0000 <p>We describe models and likelihood-based estimation of the finite population mean for a survey subject to unit non-response, when post-stratification information is available from external sources. A feature of the models is that they do not require the assumption that the data are missing at random (MAR). As a result, the proposed models provide estimates under weaker assumptions than those required in the absence of post-stratification information, thus allowing more robust inferences. In particular, we describe models for estimation of the finite population mean of a survey outcome with categorical covariates and externally observed categorical post-stratifiers. We compare inferences from the proposed method with existing design-based estimators via simulations. We apply our methods to school-level data from California Department of Education to estimate the mean academic performance index (API) score in years 1999 and 2000. We end with a discussion.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47286071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elena N. Naumova, Ryan B. Simpson, Bingjie Zhou, Meghan A. Hartwick
{"title":"Global seasonal and pandemic patterns in influenza: An application of longitudinal study designs","authors":"Elena N. Naumova, Ryan B. Simpson, Bingjie Zhou, Meghan A. Hartwick","doi":"10.1111/insr.12529","DOIUrl":"10.1111/insr.12529","url":null,"abstract":"<div>\u0000 \u0000 <p>The confluence of growing analytic capacities and global surveillance systems for seasonal infections has created new opportunities to further develop statistical methodology and advance the understanding of the global disease dynamics. We developed a framework to characterise the seasonality of infectious diseases for publicly available global health surveillance data. Specifically, we aimed to estimate the seasonal characteristics and their uncertainty using mixed effects models with harmonic components and the δ-method and develop multi-panel visualisations to present complex interplay of seasonal peaks across geographic locations. We compiled a set of 2 422 weekly time series of 14 reported outcomes for 173 Member States from the World Health Organization's (WHO) international influenza virological surveillance system, FluNet, from 02 January 1995 through 20 June 2021. We produced an analecta of data visualisations to describe global travelling waves of influenza while addressing issues of data completeness and credibility. Our results offer directions for further improvements in data collection, reporting, analysis and development of statistical methodology and predictive approaches.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41356663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synergy of Biostatistics and Epidemiology in Air Pollution Health Effects Studies","authors":"Douglas W. Dockery","doi":"10.1111/insr.12525","DOIUrl":"10.1111/insr.12525","url":null,"abstract":"<p>The extraordinary advances in quantifying the health effects of ambient air pollution over the last five decades have led to dramatic improvement in air quality in the United States. This work has been possible through innovative epidemiologic study designs coupled with advanced statistical analytic methods. This paper presents a historical perspective on the coordinated developments of epidemiologic designs and statistical methods for air pollution health effects studies at the Harvard School of Public Health.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b1/fc/INSR-90-S67.PMC9828424.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10526357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Path algorithms for fused lasso signal approximator with application to COVID-19 spread in Korea","authors":"Won Son, Johan Lim, Donghyeon Yu","doi":"10.1111/insr.12521","DOIUrl":"10.1111/insr.12521","url":null,"abstract":"<div>\u0000 \u0000 <p>The fused lasso signal approximator (FLSA) is a smoothing procedure for noisy observations that uses fused lasso penalty on unobserved mean levels to find sparse signal blocks. Several path algorithms have been developed to obtain the whole solution path of the FLSA. However, it is known that the FLSA has model selection inconsistency when the underlying signals have a stair-case block, where three consecutive signal blocks are either strictly increasing or decreasing. Modified path algorithms for the FLSA have been proposed to guarantee model selection consistency regardless of the stair-case block. In this paper, we provide a comprehensive review of the path algorithms for the FLSA and prove the properties of the recently modified path algorithms' hitting times. Specifically, we reinterpret the modified path algorithm as the path algorithm for local FLSA problems and reveal the condition that the hitting time for the fusion of the modified path algorithm is not monotone in a tuning parameter. To recover the monotonicity of the solution path, we propose a pathwise adaptive FLSA having monotonicity with similar performance as the modified solution path algorithm. Finally, we apply the proposed method to the number of daily-confirmed cases of COVID-19 in Korea to identify the change points of its spread.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9874640/pdf/INSR-9999-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10584381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accounting for Non-ignorable Sampling and Non-response in Statistical Matching","authors":"Daniela Marella, Danny Pfeffermann","doi":"10.1111/insr.12524","DOIUrl":"10.1111/insr.12524","url":null,"abstract":"<p>Data for statistical analysis is often available from different samples, with each sample containing measurements on only some of the variables of interest. Statistical matching attempts to generate a fused database containing matched measurements on all the target variables. In this article, we consider the use of statistical matching when the samples are drawn by informative sampling designs and are subject to not missing at random non-response. The problem with ignoring the sampling process and non-response is that the distribution of the data observed for the responding units can be very different from the distribution holding for the population data, which may distort the inference process and result in a matched database that misrepresents the joint distribution in the population. Our proposed methodology employs the empirical likelihood approach and is shown to perform well in a simulation experiment and when applied to real sample data.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12524","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47607196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Thinking Clearly with Data: A Guide to Quantitative Reasoning and AnalysisEthan BuenodeMesquita and AnthonyFowlerPrinceton University Press, 2021, 400 pages, $95.00/£74.00, hardback ISBN: 978‐0‐691‐21436‐8","authors":"G. Dekkers","doi":"10.1111/insr.12530","DOIUrl":"https://doi.org/10.1111/insr.12530","url":null,"abstract":"","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49030563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Bootstrap Variance Procedure for the Generalised Regression Estimator","authors":"Marius Stefan, Michael A. Hidiroglou","doi":"10.1111/insr.12528","DOIUrl":"10.1111/insr.12528","url":null,"abstract":"<div>\u0000 \u0000 <p>The generalised regression estimator (GREG) uses auxiliary data that are available from the finite population to improve the efficiency of the estimator of a total (mean). Estimators of the variance of GREG that have been proposed in the sampling literature include those based on Taylor linearisation and the jackknife techniques. Approximations based on Taylor expansions are reasonable for large samples. However, when the sample size is small, the Taylor-based variance estimator has a large negative bias. The jackknife variance estimators overestimate the variance of GREG for small sample sizes. We offset these setbacks using a bootstrap procedure for estimating the variance of the GREG. The method uses a bootstrap population constructed with the model underlying the GREG estimator. Repeated samples are selected in the bootstrap population according to the design used to select the initial sample, and the variability associated with these bootstrap samples is used to compute the proposed bootstrap variance estimator. Simulations show that the new bootstrap estimator has a small bias for samples that have few observations.</p>\u0000 </div>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48498860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}