{"title":"链接和易读性:使历史美国人口普查自动链接方法的意义*","authors":"Arkadev Ghosh, S. Hwang, Munir Squires","doi":"10.1080/07350015.2023.2205918","DOIUrl":null,"url":null,"abstract":"How does handwriting legibility affect the performance of algorithms that link individuals across census rounds? We propose a measure of legibility, which we implement at scale for the 1940 US Census, and find strikingly wide variation in enumeration-district-level legibility. Using boundary discontinuities in enumeration districts, we estimate the causal effect of low legibility on the quality of linked samples, measured by linkage rates and share of validated links. Our estimates imply that, across eight linking algorithms, perfect legibility would increase the linkage rate by 5 to 10 percentage points. Improvements in transcription could substantially increase the quality of linked samples. *We thank Santiago Pérez and seminar participants at Midwest Economic Association conference, Western Economic Association virtual international conference, and UBC Econometrics lunch for their valuable comments. This research was undertaken thanks to funding from the Canada Excellence Research Chairs program awarded to Dr. Erik Snowberg in Data-Intensive Methods in Economics. Correspondence can be addressed to hwangii@mail.ubc.ca †briq: Institute on Behavior and Inequality ‡University of British Columbia §University of British Columbia 1","PeriodicalId":118766,"journal":{"name":"Journal of Business & Economic Statistics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Links and legibility: Making sense of historical US Census automated linking methods *\",\"authors\":\"Arkadev Ghosh, S. Hwang, Munir Squires\",\"doi\":\"10.1080/07350015.2023.2205918\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"How does handwriting legibility affect the performance of algorithms that link individuals across census rounds? We propose a measure of legibility, which we implement at scale for the 1940 US Census, and find strikingly wide variation in enumeration-district-level legibility. Using boundary discontinuities in enumeration districts, we estimate the causal effect of low legibility on the quality of linked samples, measured by linkage rates and share of validated links. Our estimates imply that, across eight linking algorithms, perfect legibility would increase the linkage rate by 5 to 10 percentage points. Improvements in transcription could substantially increase the quality of linked samples. *We thank Santiago Pérez and seminar participants at Midwest Economic Association conference, Western Economic Association virtual international conference, and UBC Econometrics lunch for their valuable comments. This research was undertaken thanks to funding from the Canada Excellence Research Chairs program awarded to Dr. Erik Snowberg in Data-Intensive Methods in Economics. Correspondence can be addressed to hwangii@mail.ubc.ca †briq: Institute on Behavior and Inequality ‡University of British Columbia §University of British Columbia 1\",\"PeriodicalId\":118766,\"journal\":{\"name\":\"Journal of Business & Economic Statistics\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Business & Economic Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/07350015.2023.2205918\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business & Economic Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/07350015.2023.2205918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Links and legibility: Making sense of historical US Census automated linking methods *
How does handwriting legibility affect the performance of algorithms that link individuals across census rounds? We propose a measure of legibility, which we implement at scale for the 1940 US Census, and find strikingly wide variation in enumeration-district-level legibility. Using boundary discontinuities in enumeration districts, we estimate the causal effect of low legibility on the quality of linked samples, measured by linkage rates and share of validated links. Our estimates imply that, across eight linking algorithms, perfect legibility would increase the linkage rate by 5 to 10 percentage points. Improvements in transcription could substantially increase the quality of linked samples. *We thank Santiago Pérez and seminar participants at Midwest Economic Association conference, Western Economic Association virtual international conference, and UBC Econometrics lunch for their valuable comments. This research was undertaken thanks to funding from the Canada Excellence Research Chairs program awarded to Dr. Erik Snowberg in Data-Intensive Methods in Economics. Correspondence can be addressed to hwangii@mail.ubc.ca †briq: Institute on Behavior and Inequality ‡University of British Columbia §University of British Columbia 1