Verónica Romero, A. Toselli, Joan Andreu Sánchez, E. Vidal
{"title":"Handwriting Transcription and Keyword Spotting in Historical Daily Records Documents","authors":"Verónica Romero, A. Toselli, Joan Andreu Sánchez, E. Vidal","doi":"10.1109/DAS.2016.70","DOIUrl":null,"url":null,"abstract":"Historical records of daily activities provide an intriguing look into the historic life. These documents have interesting information, useful for demography studies and genealogical research. However, automatic processing of historical documents, has mostly been focused on single works of literature and less on daily records, which tend to have a distinct layout, structure, and vocabulary. This paper presents a study about the capability of state-of-the-art handwritten text recognition and key word spotting systems, when applied to this kind of documents. A relatively small set of handwritten birth records registered in Wien in the 16th century is used in the experiments. A word accuracy of about 70% and an AP of 0.74 are achieved for plain image transcription and key word spotting respectively. Taking into account the many difficulties exhibited by these handwritten documents, these preliminary results are quite encouraging.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2016.70","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Historical records of daily activities provide an intriguing look into the historic life. These documents have interesting information, useful for demography studies and genealogical research. However, automatic processing of historical documents, has mostly been focused on single works of literature and less on daily records, which tend to have a distinct layout, structure, and vocabulary. This paper presents a study about the capability of state-of-the-art handwritten text recognition and key word spotting systems, when applied to this kind of documents. A relatively small set of handwritten birth records registered in Wien in the 16th century is used in the experiments. A word accuracy of about 70% and an AP of 0.74 are achieved for plain image transcription and key word spotting respectively. Taking into account the many difficulties exhibited by these handwritten documents, these preliminary results are quite encouraging.