基于区域投影距离特征提取的印度手写体/混合数字识别方法

2010 12th International Conference on Frontiers in Handwriting Recognition Pub Date : 2010-11-16 DOI:10.1109/ICFHR.2010.101

S. Rajashekararadhya, P. Ranjan

{"title":"基于区域投影距离特征提取的印度手写体/混合数字识别方法","authors":"S. Rajashekararadhya, P. Ranjan","doi":"10.1109/ICFHR.2010.101","DOIUrl":null,"url":null,"abstract":"Handwriting recognition has always been a challenging task in image processing and pattern recognition. India is a multi-lingual, multi-script country, where eighteen official scripts are accepted and there are over a hundred regional languages. The feature extraction method is probably the most effective method in achieving high recognition performance. In this study we proposed a zone-based feature extraction algorithm scheme for the recognition of off-line handwritten numerals of south-Indian scripts. The character centroid is computed and the character/numeral image (50×50) is further divided in to 25 equal zones (10×10). The average distance from the character centroid to the pixels present in the zone column was computed. This procedure was sequentially repeated for all the zone/grid/box columns present in the zone (10 features). This procedure was sequentially repeated for the entire zone present in the numeral image (250 features). Similarly, again the character centroid was computed and the image is further divided into 50 equal zones (5×10). The average distance from the image centroid to the pixels present in the zone was computed. This procedure was sequentially repeated for the entire zone present in the numeral image (50 features). There could be some zone/zone column that is empty of foreground pixels, then the feature value of that zone column/zone in the feature vector is zero. Finally, 300 such features were extracted for classification and recognition. The nearest neighbor, feed forward back propagation neural network and support vector machine classifiers were used for subsequent classification and recognition purposes. We obtained a recognition rate of 98.05, for Kannada numerals, 95.1 for Tamil numerals, 97.2 for Telugu numerals and 95.7 for Malayalam numerals using support vector machine.","PeriodicalId":335044,"journal":{"name":"2010 12th International Conference on Frontiers in Handwriting Recognition","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"The Zone-Based Projection Distance Feature Extraction Method for Handwritten Numeral/Mixed Numerals Recognition of Indian Scripts\",\"authors\":\"S. Rajashekararadhya, P. Ranjan\",\"doi\":\"10.1109/ICFHR.2010.101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Handwriting recognition has always been a challenging task in image processing and pattern recognition. India is a multi-lingual, multi-script country, where eighteen official scripts are accepted and there are over a hundred regional languages. The feature extraction method is probably the most effective method in achieving high recognition performance. In this study we proposed a zone-based feature extraction algorithm scheme for the recognition of off-line handwritten numerals of south-Indian scripts. The character centroid is computed and the character/numeral image (50×50) is further divided in to 25 equal zones (10×10). The average distance from the character centroid to the pixels present in the zone column was computed. This procedure was sequentially repeated for all the zone/grid/box columns present in the zone (10 features). This procedure was sequentially repeated for the entire zone present in the numeral image (250 features). Similarly, again the character centroid was computed and the image is further divided into 50 equal zones (5×10). The average distance from the image centroid to the pixels present in the zone was computed. This procedure was sequentially repeated for the entire zone present in the numeral image (50 features). There could be some zone/zone column that is empty of foreground pixels, then the feature value of that zone column/zone in the feature vector is zero. Finally, 300 such features were extracted for classification and recognition. The nearest neighbor, feed forward back propagation neural network and support vector machine classifiers were used for subsequent classification and recognition purposes. We obtained a recognition rate of 98.05, for Kannada numerals, 95.1 for Tamil numerals, 97.2 for Telugu numerals and 95.7 for Malayalam numerals using support vector machine.\",\"PeriodicalId\":335044,\"journal\":{\"name\":\"2010 12th International Conference on Frontiers in Handwriting Recognition\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 12th International Conference on Frontiers in Handwriting Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFHR.2010.101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 12th International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2010.101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

在图像处理和模式识别领域，手写识别一直是一个具有挑战性的课题。印度是一个多语言、多文字的国家，有18种官方文字被接受，还有100多种地方语言。特征提取方法可能是实现高识别性能的最有效方法。在本研究中，我们提出了一种基于区域特征提取的南印度文字离线手写数字识别算法。计算字符质心，并将字符/数字图像(50×50)进一步划分为25个相等的区域(10×10)。计算从字符质心到区域列中存在的像素的平均距离。对区域中存在的所有区域/网格/框列(10个特征)依次重复此过程。对数字图像(250个特征)中的整个区域依次重复此过程。同样，再次计算字符质心，并将图像进一步划分为50个相等的区域(5×10)。计算图像质心到区域内像素的平均距离。对数字图像(50个特征)中的整个区域依次重复此过程。可能存在一些前景像素为空的区域/区域列，那么该区域列/区域在特征向量中的特征值为零。最后提取300个特征进行分类识别。采用最近邻分类器、前馈反向传播神经网络和支持向量机分类器进行后续分类和识别。使用支持向量机，我们获得了卡纳达语数字的98.05，泰米尔语数字的95.1，泰卢固语数字的97.2和马拉雅拉姆语数字的95.7的识别率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Zone-Based Projection Distance Feature Extraction Method for Handwritten Numeral/Mixed Numerals Recognition of Indian Scripts

Handwriting recognition has always been a challenging task in image processing and pattern recognition. India is a multi-lingual, multi-script country, where eighteen official scripts are accepted and there are over a hundred regional languages. The feature extraction method is probably the most effective method in achieving high recognition performance. In this study we proposed a zone-based feature extraction algorithm scheme for the recognition of off-line handwritten numerals of south-Indian scripts. The character centroid is computed and the character/numeral image (50×50) is further divided in to 25 equal zones (10×10). The average distance from the character centroid to the pixels present in the zone column was computed. This procedure was sequentially repeated for all the zone/grid/box columns present in the zone (10 features). This procedure was sequentially repeated for the entire zone present in the numeral image (250 features). Similarly, again the character centroid was computed and the image is further divided into 50 equal zones (5×10). The average distance from the image centroid to the pixels present in the zone was computed. This procedure was sequentially repeated for the entire zone present in the numeral image (50 features). There could be some zone/zone column that is empty of foreground pixels, then the feature value of that zone column/zone in the feature vector is zero. Finally, 300 such features were extracted for classification and recognition. The nearest neighbor, feed forward back propagation neural network and support vector machine classifiers were used for subsequent classification and recognition purposes. We obtained a recognition rate of 98.05, for Kannada numerals, 95.1 for Tamil numerals, 97.2 for Telugu numerals and 95.7 for Malayalam numerals using support vector machine.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 12th International Conference on Frontiers in Handwriting Recognition

自引率

0.00%

发文量