印度邮政自动化的多语言城市名称识别

U. Pal, Rami Kumar Roy, F. Kimura
{"title":"印度邮政自动化的多语言城市名称识别","authors":"U. Pal, Rami Kumar Roy, F. Kimura","doi":"10.1109/ICFHR.2012.238","DOIUrl":null,"url":null,"abstract":"Under three-language formula, the destination address block of postal document of an Indian state is generally written in three languages: English, Hindi and the State official language. From the statistical analysis we found that 12.37%, 76.32% and 10.21% postal documents are written in Bangla, English and Devanagari script, respectively. Because of inter-mixing of these scripts in postal address writings, it is very difficult to identify the script by which a city name is written. To avoid such script identification difficulties, in this paper we proposed a lexicon-driven method for multi-lingual (English, Hindi and Bangla) city name recognition for Indian postal automation. In the proposed scheme, at first, to take care of slanted handwriting of different individuals a slant correction technique is performed. Next, a water reservoir concept is applied to pre-segment the slant corrected city names into possible primitive components (characters or its parts). Pre-segmented components of a city name are then merged into possible characters to get the best city name using the lexicon information. In order to merge these primitive components into characters and to find optimum character segmentation, dynamic programming (DP) is applied using total likelihood of the characters of a city name as an objective function. We tested our system on 16132 Indian trilingual city names and 92.25% overall recognition accuracy was obtained.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"Multi-lingual City Name Recognition for Indian Postal Automation\",\"authors\":\"U. Pal, Rami Kumar Roy, F. Kimura\",\"doi\":\"10.1109/ICFHR.2012.238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Under three-language formula, the destination address block of postal document of an Indian state is generally written in three languages: English, Hindi and the State official language. From the statistical analysis we found that 12.37%, 76.32% and 10.21% postal documents are written in Bangla, English and Devanagari script, respectively. Because of inter-mixing of these scripts in postal address writings, it is very difficult to identify the script by which a city name is written. To avoid such script identification difficulties, in this paper we proposed a lexicon-driven method for multi-lingual (English, Hindi and Bangla) city name recognition for Indian postal automation. In the proposed scheme, at first, to take care of slanted handwriting of different individuals a slant correction technique is performed. Next, a water reservoir concept is applied to pre-segment the slant corrected city names into possible primitive components (characters or its parts). Pre-segmented components of a city name are then merged into possible characters to get the best city name using the lexicon information. In order to merge these primitive components into characters and to find optimum character segmentation, dynamic programming (DP) is applied using total likelihood of the characters of a city name as an objective function. We tested our system on 16132 Indian trilingual city names and 92.25% overall recognition accuracy was obtained.\",\"PeriodicalId\":291062,\"journal\":{\"name\":\"2012 International Conference on Frontiers in Handwriting Recognition\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on Frontiers in Handwriting Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFHR.2012.238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2012.238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

摘要

在三语模式下,印度邦邮政文件的目的地址块一般用英语、印地语和邦官方语言三种语言书写。从统计分析中我们发现,12.37%、76.32%和10.21%的邮政文件分别用孟加拉文、英文和德文书写。由于这些文字在邮政地址书写中相互混合,因此很难识别写城市名称的文字。为了避免这种脚本识别困难,本文提出了一种词典驱动的方法,用于印度邮政自动化的多语言(英语、印地语和孟加拉语)城市名称识别。在提出的方案中,首先,为了照顾不同个体的倾斜书写,执行了倾斜校正技术。接下来,应用水库概念将倾斜校正的城市名称预分割为可能的原始成分(字符或其部分)。然后将城市名称的预分割组件合并为可能的字符,以使用词典信息获得最佳城市名称。为了将这些原始成分合并成字符,并找到最优的字符分割,将城市名称字符的总似然作为目标函数,应用动态规划(DP)方法。我们对16132个印度三语城市名进行了测试,获得了92.25%的总体识别准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-lingual City Name Recognition for Indian Postal Automation
Under three-language formula, the destination address block of postal document of an Indian state is generally written in three languages: English, Hindi and the State official language. From the statistical analysis we found that 12.37%, 76.32% and 10.21% postal documents are written in Bangla, English and Devanagari script, respectively. Because of inter-mixing of these scripts in postal address writings, it is very difficult to identify the script by which a city name is written. To avoid such script identification difficulties, in this paper we proposed a lexicon-driven method for multi-lingual (English, Hindi and Bangla) city name recognition for Indian postal automation. In the proposed scheme, at first, to take care of slanted handwriting of different individuals a slant correction technique is performed. Next, a water reservoir concept is applied to pre-segment the slant corrected city names into possible primitive components (characters or its parts). Pre-segmented components of a city name are then merged into possible characters to get the best city name using the lexicon information. In order to merge these primitive components into characters and to find optimum character segmentation, dynamic programming (DP) is applied using total likelihood of the characters of a city name as an objective function. We tested our system on 16132 Indian trilingual city names and 92.25% overall recognition accuracy was obtained.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信