Simple Strategies for Improving Inference with Linked Data: A Case Study of the 1850-1930 IPUMS Linked Representative Historical Samples.

IF 1.6 2区 历史学 Q1 HISTORY
Historical Methods Pub Date : 2020-01-01 Epub Date: 2019-10-31 DOI:10.1080/01615440.2019.1630343
Martha Bailey, Connor Cole, Catherine Massey
{"title":"Simple Strategies for Improving Inference with Linked Data: A Case Study of the 1850-1930 IPUMS Linked Representative Historical Samples.","authors":"Martha Bailey, Connor Cole, Catherine Massey","doi":"10.1080/01615440.2019.1630343","DOIUrl":null,"url":null,"abstract":"<p><p>New large-scale linked data are revolutionizing quantitative history and demography. This paper proposes two complementary strategies for improving inference with linked historical data: the use of validation variables to identify higher quality links and a simple, regression-based weighting procedure to increase the representativeness of custom research samples. We demonstrate the potential value of these strategies using the 1850-1930 Integrated Public Use Microdata Series Linked Representative Samples (IPUMS-LRS)-a high quality, publicly available linked historical dataset. We show that, while incorrect linking rates appear low in the IPUMS-LRS, researchers can reduce error rates further using validation variables. We also show how researchers can reweight linked samples to balance observed characteristics in the linked sample with those in a reference population using a simple regression-based procedure.</p>","PeriodicalId":45535,"journal":{"name":"Historical Methods","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523567/pdf/nihms-1534017.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Historical Methods","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/01615440.2019.1630343","RegionNum":2,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/10/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"HISTORY","Score":null,"Total":0}
引用次数: 0

Abstract

New large-scale linked data are revolutionizing quantitative history and demography. This paper proposes two complementary strategies for improving inference with linked historical data: the use of validation variables to identify higher quality links and a simple, regression-based weighting procedure to increase the representativeness of custom research samples. We demonstrate the potential value of these strategies using the 1850-1930 Integrated Public Use Microdata Series Linked Representative Samples (IPUMS-LRS)-a high quality, publicly available linked historical dataset. We show that, while incorrect linking rates appear low in the IPUMS-LRS, researchers can reduce error rates further using validation variables. We also show how researchers can reweight linked samples to balance observed characteristics in the linked sample with those in a reference population using a simple regression-based procedure.

利用关联数据改进推断的简单策略:1850-1930年IPUMS关联代表性历史样本案例研究》。
新的大规模链接数据正在彻底改变定量历史学和人口学。本文提出了利用关联历史数据改进推断的两种互补策略:使用验证变量来识别更高质量的关联,以及使用基于回归的简单加权程序来提高定制研究样本的代表性。我们利用 1850-1930 年综合公共使用微数据系列关联代表样本(IPUMS-LRS)--一个高质量、公开可用的关联历史数据集,展示了这些策略的潜在价值。我们的研究表明,虽然 IPUMS-LRS 的链接错误率较低,但研究人员可以利用验证变量进一步降低错误率。我们还展示了研究人员如何利用基于回归的简单程序对链接样本进行重新加权,以平衡链接样本中的观测特征与参考人口中的观测特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Historical Methods
Historical Methods Multiple-
CiteScore
3.20
自引率
7.10%
发文量
13
期刊介绍: Historical Methodsreaches an international audience of social scientists concerned with historical problems. It explores interdisciplinary approaches to new data sources, new approaches to older questions and material, and practical discussions of computer and statistical methodology, data collection, and sampling procedures. The journal includes the following features: “Evidence Matters” emphasizes how to find, decipher, and analyze evidence whether or not that evidence is meant to be quantified. “Database Developments” announces major new public databases or large alterations in older ones, discusses innovative ways to organize them, and explains new ways of categorizing information.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信