Creation of data resources and design of an evaluation test bed for Devanagari script recognition

S. Setlur, Suryaprakash Kompalli, V. Ramanaprasad, V. Govindaraju
{"title":"Creation of data resources and design of an evaluation test bed for Devanagari script recognition","authors":"S. Setlur, Suryaprakash Kompalli, V. Ramanaprasad, V. Govindaraju","doi":"10.1109/RIDE.2003.1249846","DOIUrl":null,"url":null,"abstract":"The Indian subcontinent has a large number of languages, dialects, and scripts with the Devanagari script being the primary and most widely used of all the scripts. To date, much of the Devanagari optical character recognition (OCR) research has been restricted to a handful of groups. So, techniques have not yet been widely disseminated or evaluated independently and automated evaluation tools are currently not available for lack of a standard representation of ground-truth and result data. A key reason for the absence of sustained research efforts in off-line Devanagari OCR appears to be the paucity of data resources. Ground truthed data for words and characters, on-line dictionaries, corpora of text documents and reliable, standardized statistical analyses and evaluation tools are currently lacking. So, the creation of such data resources will undoubtedly provide a much needed fillip to researchers working on Devanagari OCR. This paper describes a National Science Foundation sponsored project under the International Digital Libraries program to create data resources that will facilitate development of Devanagari OCR technology and provide a standardized test bed and evaluation tools for Devanagari script recognition.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIDE.2003.1249846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

The Indian subcontinent has a large number of languages, dialects, and scripts with the Devanagari script being the primary and most widely used of all the scripts. To date, much of the Devanagari optical character recognition (OCR) research has been restricted to a handful of groups. So, techniques have not yet been widely disseminated or evaluated independently and automated evaluation tools are currently not available for lack of a standard representation of ground-truth and result data. A key reason for the absence of sustained research efforts in off-line Devanagari OCR appears to be the paucity of data resources. Ground truthed data for words and characters, on-line dictionaries, corpora of text documents and reliable, standardized statistical analyses and evaluation tools are currently lacking. So, the creation of such data resources will undoubtedly provide a much needed fillip to researchers working on Devanagari OCR. This paper describes a National Science Foundation sponsored project under the International Digital Libraries program to create data resources that will facilitate development of Devanagari OCR technology and provide a standardized test bed and evaluation tools for Devanagari script recognition.
数据资源的创建和Devanagari文字识别评估测试平台的设计
印度次大陆有大量的语言、方言和文字,Devanagari文字是所有文字中最主要和最广泛使用的。迄今为止,许多Devanagari光学字符识别(OCR)的研究仅限于少数几个小组。因此,技术尚未得到广泛传播或独立评估,由于缺乏对基础真相和结果数据的标准表示,目前也没有自动化评估工具。离线Devanagari OCR缺乏持续研究的一个关键原因似乎是数据资源的缺乏。目前缺乏真实的字词数据、在线词典、文本文档语料库和可靠的、标准化的统计分析和评估工具。因此,这些数据资源的创建无疑将为Devanagari OCR的研究人员提供急需的刺激。本文描述了国家科学基金会在国际数字图书馆计划下赞助的一个项目,该项目旨在创建数据资源,以促进Devanagari OCR技术的发展,并为Devanagari脚本识别提供标准化的测试平台和评估工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信