{"title":"Simple Strategies for Improving Inference with Linked Data: A Case Study of the 1850-1930 IPUMS Linked Representative Historical Samples.","authors":"Martha Bailey, Connor Cole, Catherine Massey","doi":"10.1080/01615440.2019.1630343","DOIUrl":null,"url":null,"abstract":"<p><p>New large-scale linked data are revolutionizing quantitative history and demography. This paper proposes two complementary strategies for improving inference with linked historical data: the use of validation variables to identify higher quality links and a simple, regression-based weighting procedure to increase the representativeness of custom research samples. We demonstrate the potential value of these strategies using the 1850-1930 Integrated Public Use Microdata Series Linked Representative Samples (IPUMS-LRS)-a high quality, publicly available linked historical dataset. We show that, while incorrect linking rates appear low in the IPUMS-LRS, researchers can reduce error rates further using validation variables. We also show how researchers can reweight linked samples to balance observed characteristics in the linked sample with those in a reference population using a simple regression-based procedure.</p>","PeriodicalId":45535,"journal":{"name":"Historical Methods","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523567/pdf/nihms-1534017.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Historical Methods","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/01615440.2019.1630343","RegionNum":2,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/10/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"HISTORY","Score":null,"Total":0}
引用次数: 0
Abstract
New large-scale linked data are revolutionizing quantitative history and demography. This paper proposes two complementary strategies for improving inference with linked historical data: the use of validation variables to identify higher quality links and a simple, regression-based weighting procedure to increase the representativeness of custom research samples. We demonstrate the potential value of these strategies using the 1850-1930 Integrated Public Use Microdata Series Linked Representative Samples (IPUMS-LRS)-a high quality, publicly available linked historical dataset. We show that, while incorrect linking rates appear low in the IPUMS-LRS, researchers can reduce error rates further using validation variables. We also show how researchers can reweight linked samples to balance observed characteristics in the linked sample with those in a reference population using a simple regression-based procedure.
期刊介绍:
Historical Methodsreaches an international audience of social scientists concerned with historical problems. It explores interdisciplinary approaches to new data sources, new approaches to older questions and material, and practical discussions of computer and statistical methodology, data collection, and sampling procedures. The journal includes the following features: “Evidence Matters” emphasizes how to find, decipher, and analyze evidence whether or not that evidence is meant to be quantified. “Database Developments” announces major new public databases or large alterations in older ones, discusses innovative ways to organize them, and explains new ways of categorizing information.