{"title":"Improving Probabilistic Record Linkage Using Statistical Prediction Models","authors":"Angelo Moretti, N. Shlomo","doi":"10.1111/insr.12535","DOIUrl":null,"url":null,"abstract":"Record linkage brings together information from records in two or more data sources that are believed to belong to the same statistical unit based on a common set of matching variables. Matching variables, however, can appear with errors and variations and the challenge is to link statistical units that are subject to error. We provide an overview of record linkage techniques and specifically investigate the classic Fellegi and Sunter probabilistic record linkage framework to assess whether the decision rule for classifying pairs into sets of matches and non‐matches can be improved by incorporating a statistical prediction model. We also study whether the enhanced linkage rule can provide better results in terms of preserving associations between variables in the linked data file that are not used in the matching procedure. A simulation study and an application based on real data are used to evaluate the methods.","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Statistical Review","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1111/insr.12535","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Record linkage brings together information from records in two or more data sources that are believed to belong to the same statistical unit based on a common set of matching variables. Matching variables, however, can appear with errors and variations and the challenge is to link statistical units that are subject to error. We provide an overview of record linkage techniques and specifically investigate the classic Fellegi and Sunter probabilistic record linkage framework to assess whether the decision rule for classifying pairs into sets of matches and non‐matches can be improved by incorporating a statistical prediction model. We also study whether the enhanced linkage rule can provide better results in terms of preserving associations between variables in the linked data file that are not used in the matching procedure. A simulation study and an application based on real data are used to evaluate the methods.
期刊介绍:
International Statistical Review is the flagship journal of the International Statistical Institute (ISI) and of its family of Associations. It publishes papers of broad and general interest in statistics and probability. The term Review is to be interpreted broadly. The types of papers that are suitable for publication include (but are not limited to) the following: reviews/surveys of significant developments in theory, methodology, statistical computing and graphics, statistical education, and application areas; tutorials on important topics; expository papers on emerging areas of research or application; papers describing new developments and/or challenges in relevant areas; papers addressing foundational issues; papers on the history of statistics and probability; white papers on topics of importance to the profession or society; and historical assessment of seminal papers in the field and their impact.