Breakthroughs in historical record linking using genealogy data: The Census Tree project

IF 1.7 1区 历史学 Q1 ECONOMICS
Kasey Buckles , Adrian Haws , Joseph Price , Haley E.B. Wilbert
{"title":"Breakthroughs in historical record linking using genealogy data: The Census Tree project","authors":"Kasey Buckles ,&nbsp;Adrian Haws ,&nbsp;Joseph Price ,&nbsp;Haley E.B. Wilbert","doi":"10.1016/j.eeh.2025.101717","DOIUrl":null,"url":null,"abstract":"<div><div>The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. To create the Census Tree, we begin with a collection of high-quality links contributed by the users of a free online genealogy platform, many of which would be difficult or impossible to find using currently available linking technologies. We then use these links as training data for a machine learning algorithm to make new matches, and incorporate other recent efforts to link the historical U.S. censuses. Finally, we introduce a procedure for filtering the links and adjudicating disagreements. Our complete Census Tree achieves match rates across adjacent censuses that are between 69 and 86 % for men and between 58 and 79 % for women—a major breakthrough compared to previous linking efforts. The size of the Census Tree allows researchers in the social sciences and other disciplines to construct longitudinal datasets that are highly representative of the population. We validate the accuracy of these links and provide researchers with a simple tool for choosing their preferred tradeoff between sample size and accuracy. To demonstrate the advantages of the Census Tree, we extend the work of Abramitzky, Boustan, Jácome, and Pérez (2021) to include intergenerational mobility estimates for additional immigrant nationalities and for women.</div></div>","PeriodicalId":47413,"journal":{"name":"Explorations in Economic History","volume":"98 ","pages":"Article 101717"},"PeriodicalIF":1.7000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Explorations in Economic History","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0014498325000646","RegionNum":1,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. To create the Census Tree, we begin with a collection of high-quality links contributed by the users of a free online genealogy platform, many of which would be difficult or impossible to find using currently available linking technologies. We then use these links as training data for a machine learning algorithm to make new matches, and incorporate other recent efforts to link the historical U.S. censuses. Finally, we introduce a procedure for filtering the links and adjudicating disagreements. Our complete Census Tree achieves match rates across adjacent censuses that are between 69 and 86 % for men and between 58 and 79 % for women—a major breakthrough compared to previous linking efforts. The size of the Census Tree allows researchers in the social sciences and other disciplines to construct longitudinal datasets that are highly representative of the population. We validate the accuracy of these links and provide researchers with a simple tool for choosing their preferred tradeoff between sample size and accuracy. To demonstrate the advantages of the Census Tree, we extend the work of Abramitzky, Boustan, Jácome, and Pérez (2021) to include intergenerational mobility estimates for additional immigrant nationalities and for women.
使用家谱数据链接历史记录的突破:人口普查树项目
人口普查树是美国历史上人口普查中最大的记录链接数据库,其中有超过7亿个链接,涉及1850年至1940年间生活在美国的人。为了创建人口普查树,我们从免费在线家谱平台用户提供的高质量链接集合开始,其中许多链接使用当前可用的链接技术很难或不可能找到。然后,我们使用这些链接作为机器学习算法的训练数据来进行新的匹配,并结合其他最近的努力来连接历史上的美国人口普查。最后,我们介绍了一个筛选链接和裁决分歧的程序。我们完整的人口普查树在相邻的人口普查中实现了匹配率,男性在69%到86%之间,女性在58%到79%之间——与之前的联系工作相比,这是一个重大突破。人口普查树的规模使社会科学和其他学科的研究人员能够构建高度代表人口的纵向数据集。我们验证了这些链接的准确性,并为研究人员提供了一个简单的工具来选择他们在样本量和准确性之间的首选权衡。为了证明人口普查树的优势,我们扩展了Abramitzky、Boustan、Jácome和psamurez(2021)的工作,包括对其他移民国籍和女性的代际流动性估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.50
自引率
8.70%
发文量
27
期刊介绍: Explorations in Economic History provides broad coverage of the application of economic analysis to historical episodes. The journal has a tradition of innovative applications of theory and quantitative techniques, and it explores all aspects of economic change, all historical periods, all geographical locations, and all political and social systems. The journal includes papers by economists, economic historians, demographers, geographers, and sociologists. Explorations in Economic History is the only journal where you will find "Essays in Exploration." This unique department alerts economic historians to the potential in a new area of research, surveying the recent literature and then identifying the most promising issues to pursue.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信