多脚本手写文档的页面级脚本标识

P. Singh, S. Dalal, R. Sarkar, M. Nasipuri
{"title":"多脚本手写文档的页面级脚本标识","authors":"P. Singh, S. Dalal, R. Sarkar, M. Nasipuri","doi":"10.1109/C3IT.2015.7060113","DOIUrl":null,"url":null,"abstract":"Script identification has long been the forerunner of many Optical Character Recognition (OCR) processes in a multi-lingual document environment. Script identification has numerous applications in the field of document image analysis, such as document sorting, indexing, retrieval and translation, etc. In this paper, we have developed a page-level script identification technique for handwritten documents using the texture features. The texture features are extracted from the document pages based on the Gray Level Co-occurrence Matrix (GLCM). The proposed technique has been evaluated on four scripts namely, Bangla, Devnagari, Telugu, and Roman using multiple classifiers. Based on their identification accuracies, it is observed that Multi Layer Perceptron (MLP) classifier performs the best. The experimental results demonstrate the effectiveness of the GLCM features in identification of handwritten scripts. Experiments are conducted on a total of 120 document pages and the overall accuracy of the system is found to be 91.48%. Though the system is evaluated on limited dataset, considering the complexities of the scripts, the result can be assumed satisfactory.","PeriodicalId":402311,"journal":{"name":"Proceedings of the 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Page-level script identification from multi-script handwritten documents\",\"authors\":\"P. Singh, S. Dalal, R. Sarkar, M. Nasipuri\",\"doi\":\"10.1109/C3IT.2015.7060113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Script identification has long been the forerunner of many Optical Character Recognition (OCR) processes in a multi-lingual document environment. Script identification has numerous applications in the field of document image analysis, such as document sorting, indexing, retrieval and translation, etc. In this paper, we have developed a page-level script identification technique for handwritten documents using the texture features. The texture features are extracted from the document pages based on the Gray Level Co-occurrence Matrix (GLCM). The proposed technique has been evaluated on four scripts namely, Bangla, Devnagari, Telugu, and Roman using multiple classifiers. Based on their identification accuracies, it is observed that Multi Layer Perceptron (MLP) classifier performs the best. The experimental results demonstrate the effectiveness of the GLCM features in identification of handwritten scripts. Experiments are conducted on a total of 120 document pages and the overall accuracy of the system is found to be 91.48%. Though the system is evaluated on limited dataset, considering the complexities of the scripts, the result can be assumed satisfactory.\",\"PeriodicalId\":402311,\"journal\":{\"name\":\"Proceedings of the 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/C3IT.2015.7060113\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/C3IT.2015.7060113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

摘要

在多语言文档环境中,脚本识别一直是许多光学字符识别(OCR)过程的先导。文字识别在文档图像分析领域有着广泛的应用,如文档排序、索引、检索和翻译等。在本文中,我们利用纹理特征开发了一种用于手写文档的页面级脚本识别技术。基于灰度共生矩阵(GLCM)提取文档页面的纹理特征。使用多个分类器对四种文字即孟加拉语、德文加里语、泰卢固语和罗马语进行了评估。通过对其识别精度的比较,发现多层感知器(MLP)分类器的识别精度最高。实验结果证明了GLCM特征在手写体识别中的有效性。在120个文档页面上进行了实验,系统的总体准确率为91.48%。虽然系统在有限的数据集上进行了评估,但考虑到脚本的复杂性,可以认为结果是令人满意的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Page-level script identification from multi-script handwritten documents
Script identification has long been the forerunner of many Optical Character Recognition (OCR) processes in a multi-lingual document environment. Script identification has numerous applications in the field of document image analysis, such as document sorting, indexing, retrieval and translation, etc. In this paper, we have developed a page-level script identification technique for handwritten documents using the texture features. The texture features are extracted from the document pages based on the Gray Level Co-occurrence Matrix (GLCM). The proposed technique has been evaluated on four scripts namely, Bangla, Devnagari, Telugu, and Roman using multiple classifiers. Based on their identification accuracies, it is observed that Multi Layer Perceptron (MLP) classifier performs the best. The experimental results demonstrate the effectiveness of the GLCM features in identification of handwritten scripts. Experiments are conducted on a total of 120 document pages and the overall accuracy of the system is found to be 91.48%. Though the system is evaluated on limited dataset, considering the complexities of the scripts, the result can be assumed satisfactory.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信