{"title":"Ruling analysis and classification of torn documents","authors":"Markus Diem, Florian Kleber, Robert Sablatnig","doi":"10.1145/2644866.2644876","DOIUrl":null,"url":null,"abstract":"A ruling classification is presented in this paper. In contrast to state-of-the-art methods which focus on ruling line removal, ruling lines are analyzed for document clustering in the context of document snippet reassembling. First, a background patch is extracted from a snippet at a position which minimizes the inscribed content. A novel Fourier feature is then computed on the image patch. The classification into void, lined and checked is carried out using Support Vector Machines. Finally, an accurate line localization is performed by means of projection profiles and robust line fitting. The ruling classification achieves an F-score of 0.987 evaluated on a dataset comprising real world document snippets. In addition the line removal was evaluated on a synthetically generated dataset where an F-score of 0.931 is achieved. This dataset is made publicly available so as to allow for benchmarking.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"8 1","pages":"63-72"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2644866.2644876","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A ruling classification is presented in this paper. In contrast to state-of-the-art methods which focus on ruling line removal, ruling lines are analyzed for document clustering in the context of document snippet reassembling. First, a background patch is extracted from a snippet at a position which minimizes the inscribed content. A novel Fourier feature is then computed on the image patch. The classification into void, lined and checked is carried out using Support Vector Machines. Finally, an accurate line localization is performed by means of projection profiles and robust line fitting. The ruling classification achieves an F-score of 0.987 evaluated on a dataset comprising real world document snippets. In addition the line removal was evaluated on a synthetically generated dataset where an F-score of 0.931 is achieved. This dataset is made publicly available so as to allow for benchmarking.