{"title":"Combining spatial and transform features for the recognition of middle zone components of Telugu","authors":"A. Sastry, S. Lanka., P. P. Clee, L. Reddy","doi":"10.1109/TENCON.2008.4766721","DOIUrl":null,"url":null,"abstract":"The transformation from the traditional paper based society to a truly paperless information society involves huge amount of knowledge with necessary algorithmic approaches in the area of Document Image Processing. Progress in Indic Script analysis gained momentum in the recent period. Individual characters in these scripts undergo large number of shape variations due to complex nature of the canonical structure resembling the phonetic sequence. Separation of individual components and establishment of the relationship between these components in the recognition process is the major approach found in literature. In this paper, an attempt is made to extract Middle Zone Components by combining Component model and Zone Separation model on Telugu Document Images. Recognition of middle zone components is achieved with a novel technique of combining spatial features for understanding the topological characteristics and transform feature for effective classification. A tree classifier is adopted with Euler Number, Compact Ratio and Zernike moment as features. Unsupervised training strategy is adopted to identify the Middle Zone components. The optimum size of the training set is evaluated for various font sizes.","PeriodicalId":22230,"journal":{"name":"TENCON 2008 - 2008 IEEE Region 10 Conference","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"TENCON 2008 - 2008 IEEE Region 10 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2008.4766721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The transformation from the traditional paper based society to a truly paperless information society involves huge amount of knowledge with necessary algorithmic approaches in the area of Document Image Processing. Progress in Indic Script analysis gained momentum in the recent period. Individual characters in these scripts undergo large number of shape variations due to complex nature of the canonical structure resembling the phonetic sequence. Separation of individual components and establishment of the relationship between these components in the recognition process is the major approach found in literature. In this paper, an attempt is made to extract Middle Zone Components by combining Component model and Zone Separation model on Telugu Document Images. Recognition of middle zone components is achieved with a novel technique of combining spatial features for understanding the topological characteristics and transform feature for effective classification. A tree classifier is adopted with Euler Number, Compact Ratio and Zernike moment as features. Unsupervised training strategy is adopted to identify the Middle Zone components. The optimum size of the training set is evaluated for various font sizes.