Andreas Fischer, M. Baechler, A. Garz, M. Liwicki, R. Ingold
{"title":"A Combined System for Text Line Extraction and Handwriting Recognition in Historical Documents","authors":"Andreas Fischer, M. Baechler, A. Garz, M. Liwicki, R. Ingold","doi":"10.1109/DAS.2014.51","DOIUrl":null,"url":null,"abstract":"Automated reading of historical handwriting is needed to search and browse ancient manuscripts in digital libraries based on their textual content. In this paper, we present a combined system for text localization and transcription in page images. It includes flexible learning-based methods for layout analysis and handwriting recognition, which were developed in the context of the Swiss research project HisDoc. A comprehensive experimental evaluation is provided for the medieval Parzival database, demonstrating a promising word recognition accuracy of 93.0% with closed vocabulary. In order to harmonize the evaluation of the two document analysis tasks, we introduce a novel evaluation measure for text line extraction that takes substitution, deletion, as well as insertion errors into account.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th IAPR International Workshop on Document Analysis Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2014.51","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
Automated reading of historical handwriting is needed to search and browse ancient manuscripts in digital libraries based on their textual content. In this paper, we present a combined system for text localization and transcription in page images. It includes flexible learning-based methods for layout analysis and handwriting recognition, which were developed in the context of the Swiss research project HisDoc. A comprehensive experimental evaluation is provided for the medieval Parzival database, demonstrating a promising word recognition accuracy of 93.0% with closed vocabulary. In order to harmonize the evaluation of the two document analysis tasks, we introduce a novel evaluation measure for text line extraction that takes substitution, deletion, as well as insertion errors into account.