{"title":"Models for decryption of historical shorthand documents","authors":"A. Rogov, Mikhail B. Gippiev, Ivan Shterkel","doi":"10.1109/SCP.2015.7342239","DOIUrl":null,"url":null,"abstract":"This article presents methods that are used for historical shorthand documents recognition. We distinguish following tasks: binarization, clusterization, lines recognition and determination of symbols types (main, superscript, subscript). Each method is evaluated in terms of recall, precision and F-measure criteria. The best method for binarization of shorthand documents appeared to be the modified threshold method. We proposed following methods for graphic symbols clustering: the method of segments lengths comparison, the method of projections comparison and the method of baskets. The best result is achieved with the method of baskets. We also present the algorithms of lines recognition and symbols classification. Lines recognition is performed using two methods: nearest neighbour and relations graph construction. Symbols classification is done by single and by double approximation methods and their modification. The best result of lines segmentation is demonstrated by the method of relations graph construction, and the best result of determination of symbols types is demonstrated by the modified double approximation method.","PeriodicalId":110366,"journal":{"name":"2015 International Conference \"Stability and Control Processes\" in Memory of V.I. Zubov (SCP)","volume":"478 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference \"Stability and Control Processes\" in Memory of V.I. Zubov (SCP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCP.2015.7342239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This article presents methods that are used for historical shorthand documents recognition. We distinguish following tasks: binarization, clusterization, lines recognition and determination of symbols types (main, superscript, subscript). Each method is evaluated in terms of recall, precision and F-measure criteria. The best method for binarization of shorthand documents appeared to be the modified threshold method. We proposed following methods for graphic symbols clustering: the method of segments lengths comparison, the method of projections comparison and the method of baskets. The best result is achieved with the method of baskets. We also present the algorithms of lines recognition and symbols classification. Lines recognition is performed using two methods: nearest neighbour and relations graph construction. Symbols classification is done by single and by double approximation methods and their modification. The best result of lines segmentation is demonstrated by the method of relations graph construction, and the best result of determination of symbols types is demonstrated by the modified double approximation method.