Christopher Blomqvist , Kerstin Enflo , Andreas Jakobsson , Kalle Åström
{"title":"Reading the ransom: Methodological advancements in extracting the Swedish Wealth Tax of 1571","authors":"Christopher Blomqvist , Kerstin Enflo , Andreas Jakobsson , Kalle Åström","doi":"10.1016/j.eeh.2022.101470","DOIUrl":null,"url":null,"abstract":"<div><p>We describe a deep learning method to read hand-written records from the 16th century. The method consists of a combination of a segmentation module and a Handwritten Text Recognition (HTR) module. The transformer-based HTR module exploits both language and image features in reading, classifying and extracting the position of each word on the page. The method is demonstrated on a unique historical document: The Swedish Wealth Tax of 1571. Results suggest that the segmentation module performs significantly better than the lay-out analysis implemented in state-of-the art programs, enabling us to trace many more text blocks correctly on each page. The HTR module has a low character error rate (CER), in addition to being able to classify words and help organize them into tabular formats. By demonstrating an automated process to transform loosely structured handwritten information from the 16th century into organized tables, our method should interest economic historians seeking to digitize and organize quantitative material from pre-industrial periods.</p></div>","PeriodicalId":47413,"journal":{"name":"Explorations in Economic History","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Explorations in Economic History","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0014498322000481","RegionNum":1,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
We describe a deep learning method to read hand-written records from the 16th century. The method consists of a combination of a segmentation module and a Handwritten Text Recognition (HTR) module. The transformer-based HTR module exploits both language and image features in reading, classifying and extracting the position of each word on the page. The method is demonstrated on a unique historical document: The Swedish Wealth Tax of 1571. Results suggest that the segmentation module performs significantly better than the lay-out analysis implemented in state-of-the art programs, enabling us to trace many more text blocks correctly on each page. The HTR module has a low character error rate (CER), in addition to being able to classify words and help organize them into tabular formats. By demonstrating an automated process to transform loosely structured handwritten information from the 16th century into organized tables, our method should interest economic historians seeking to digitize and organize quantitative material from pre-industrial periods.
期刊介绍:
Explorations in Economic History provides broad coverage of the application of economic analysis to historical episodes. The journal has a tradition of innovative applications of theory and quantitative techniques, and it explores all aspects of economic change, all historical periods, all geographical locations, and all political and social systems. The journal includes papers by economists, economic historians, demographers, geographers, and sociologists. Explorations in Economic History is the only journal where you will find "Essays in Exploration." This unique department alerts economic historians to the potential in a new area of research, surveying the recent literature and then identifying the most promising issues to pursue.