V. Katsouros, V. Papavassiliou, Fotini Simistira, B. Gatos
{"title":"Recognition of Greek Polytonic on Historical Degraded Texts Using HMMs","authors":"V. Katsouros, V. Papavassiliou, Fotini Simistira, B. Gatos","doi":"10.1109/DAS.2016.60","DOIUrl":null,"url":null,"abstract":"Optical Character Recognition (OCR) of ancient Greek polytonic scripts is a challenging task due to the large number of character classes, resulting from variations of diacritical marks on the vowel letters. Classical OCR systems require a character segmentation phase, which in the case of Greek polytonic scripts is the main source of errors that finally affects the overall OCR performance. This paper suggests a character segmentation free HMM-based recognition system and compares its performance with other commercial, open source, and state-of-the art OCR systems. The evaluation has been carried out on a challenging novel dataset of Greek polytonic degraded texts and has shown that HMM-based OCR yields character and word level error rates of 8.61% and 25.30% respectively, which outperforms most of the available OCR systems and it is comparable with the performance of the state-of-the-art system based on LSTM Networks proposed recently.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2016.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Optical Character Recognition (OCR) of ancient Greek polytonic scripts is a challenging task due to the large number of character classes, resulting from variations of diacritical marks on the vowel letters. Classical OCR systems require a character segmentation phase, which in the case of Greek polytonic scripts is the main source of errors that finally affects the overall OCR performance. This paper suggests a character segmentation free HMM-based recognition system and compares its performance with other commercial, open source, and state-of-the art OCR systems. The evaluation has been carried out on a challenging novel dataset of Greek polytonic degraded texts and has shown that HMM-based OCR yields character and word level error rates of 8.61% and 25.30% respectively, which outperforms most of the available OCR systems and it is comparable with the performance of the state-of-the-art system based on LSTM Networks proposed recently.