{"title":"Document-Zone Classification in Torn Documents","authors":"S. Chanda, K. Franke, U. Pal","doi":"10.1109/ICFHR.2010.12","DOIUrl":null,"url":null,"abstract":"Arbitrary orientation and sparse data content are common characteristics of torn document. To ensure accuracy and reliability in computer-based analysis, content-zone segmentation is required. In our previous work, we studied segmentation of handwritten and printed text. A questioned document-piece in the form of an office note, however, might also contain non-text data like logos, graphics, and pictures. Hence a more precise content-zone classification is required. In this paper we propose a two-tier approach for non-text, handwriting and printed text segmentation. The first tier aims to discriminate text and non-text regions. The second tier classifies handwritten and printed text within all text zones identified during the first tier. Gabor features and chain-code features are used in Tier-1 and Tier-2, respectively. By using SVM classifier we successfully identified 97.65% of 31,227 text regions in our current test data. The proposed approach identified 98.69% of printed and 96.39% of handwritten text amongst all identified text regions.","PeriodicalId":335044,"journal":{"name":"2010 12th International Conference on Frontiers in Handwriting Recognition","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 12th International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2010.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Arbitrary orientation and sparse data content are common characteristics of torn document. To ensure accuracy and reliability in computer-based analysis, content-zone segmentation is required. In our previous work, we studied segmentation of handwritten and printed text. A questioned document-piece in the form of an office note, however, might also contain non-text data like logos, graphics, and pictures. Hence a more precise content-zone classification is required. In this paper we propose a two-tier approach for non-text, handwriting and printed text segmentation. The first tier aims to discriminate text and non-text regions. The second tier classifies handwritten and printed text within all text zones identified during the first tier. Gabor features and chain-code features are used in Tier-1 and Tier-2, respectively. By using SVM classifier we successfully identified 97.65% of 31,227 text regions in our current test data. The proposed approach identified 98.69% of printed and 96.39% of handwritten text amongst all identified text regions.