{"title":"Optical Character Recognition for Coptic fonts: A multi-source approach for scholarly editions","authors":"E. Lincke, Kirill Bulert, Marco Büchler","doi":"10.1145/3322905.3322931","DOIUrl":null,"url":null,"abstract":"In this paper, we show that the OCR engine Ocropy can be trained for fonts used in rather complex and varied Coptic typeset. For each of the three fonts presented in this paper, we used a number of texts from scholarly editions with different philological and editorial standards and texts from two different dialects of Coptic (Bohairic and Sahidic). Despite the complexity of the training data, we observed accuracy rates of 97.5%, for one font even up to 99%.","PeriodicalId":418911,"journal":{"name":"Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3322905.3322931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we show that the OCR engine Ocropy can be trained for fonts used in rather complex and varied Coptic typeset. For each of the three fonts presented in this paper, we used a number of texts from scholarly editions with different philological and editorial standards and texts from two different dialects of Coptic (Bohairic and Sahidic). Despite the complexity of the training data, we observed accuracy rates of 97.5%, for one font even up to 99%.