Kasey Buckles , Adrian Haws , Joseph Price , Haley E.B. Wilbert
{"title":"Breakthroughs in historical record linking using genealogy data: The Census Tree project","authors":"Kasey Buckles , Adrian Haws , Joseph Price , Haley E.B. Wilbert","doi":"10.1016/j.eeh.2025.101717","DOIUrl":null,"url":null,"abstract":"<div><div>The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. To create the Census Tree, we begin with a collection of high-quality links contributed by the users of a free online genealogy platform, many of which would be difficult or impossible to find using currently available linking technologies. We then use these links as training data for a machine learning algorithm to make new matches, and incorporate other recent efforts to link the historical U.S. censuses. Finally, we introduce a procedure for filtering the links and adjudicating disagreements. Our complete Census Tree achieves match rates across adjacent censuses that are between 69 and 86 % for men and between 58 and 79 % for women—a major breakthrough compared to previous linking efforts. The size of the Census Tree allows researchers in the social sciences and other disciplines to construct longitudinal datasets that are highly representative of the population. We validate the accuracy of these links and provide researchers with a simple tool for choosing their preferred tradeoff between sample size and accuracy. To demonstrate the advantages of the Census Tree, we extend the work of Abramitzky, Boustan, Jácome, and Pérez (2021) to include intergenerational mobility estimates for additional immigrant nationalities and for women.</div></div>","PeriodicalId":47413,"journal":{"name":"Explorations in Economic History","volume":"98 ","pages":"Article 101717"},"PeriodicalIF":1.7000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Explorations in Economic History","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0014498325000646","RegionNum":1,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. To create the Census Tree, we begin with a collection of high-quality links contributed by the users of a free online genealogy platform, many of which would be difficult or impossible to find using currently available linking technologies. We then use these links as training data for a machine learning algorithm to make new matches, and incorporate other recent efforts to link the historical U.S. censuses. Finally, we introduce a procedure for filtering the links and adjudicating disagreements. Our complete Census Tree achieves match rates across adjacent censuses that are between 69 and 86 % for men and between 58 and 79 % for women—a major breakthrough compared to previous linking efforts. The size of the Census Tree allows researchers in the social sciences and other disciplines to construct longitudinal datasets that are highly representative of the population. We validate the accuracy of these links and provide researchers with a simple tool for choosing their preferred tradeoff between sample size and accuracy. To demonstrate the advantages of the Census Tree, we extend the work of Abramitzky, Boustan, Jácome, and Pérez (2021) to include intergenerational mobility estimates for additional immigrant nationalities and for women.
期刊介绍:
Explorations in Economic History provides broad coverage of the application of economic analysis to historical episodes. The journal has a tradition of innovative applications of theory and quantitative techniques, and it explores all aspects of economic change, all historical periods, all geographical locations, and all political and social systems. The journal includes papers by economists, economic historians, demographers, geographers, and sociologists. Explorations in Economic History is the only journal where you will find "Essays in Exploration." This unique department alerts economic historians to the potential in a new area of research, surveying the recent literature and then identifying the most promising issues to pursue.