使用深度学习的古吉拉特语手写孤立词的字符分割

Riya P. Javia, Mukesh M Goswami, S. Mitra
{"title":"使用深度学习的古吉拉特语手写孤立词的字符分割","authors":"Riya P. Javia, Mukesh M Goswami, S. Mitra","doi":"10.1109/INDICON52576.2021.9691590","DOIUrl":null,"url":null,"abstract":"Information retrieval from scanned handwritten digital copies is a very challenging task especially in Indian scripts like Gujarati due to the presence of joint and conjuct characters as well as matras, cursive nature and varying size of the characters. There are two methods namely recognition-based and recognition-free for document image retrieval. The difference in both approaches lies in the level of segmentation. There are two levels of segmentation namely Fine and Coarse Grain. In Fine-Grain segmentation, the base character and the matras are considered as separate and are two different units of segmentation. In Coarse-Grain segmentation, the base character and matras are considered as a single unit of segmentation. The accuracy of the segmentation highly affects the result of information retrieval. The research here heads towards addressing these issues. Deep learning has been very effective in many domains but has not been used much in this domain. In this research, we propose a Coarse Grain segmentation method using the object detection model Faster RCNN and a Fine Grain segmentation method using a combination of Connected Component Analysis and Faster RCNN. The annotation of the dataset for training these models has been carried out manually using LabelImg tool.","PeriodicalId":106004,"journal":{"name":"2021 IEEE 18th India Council International Conference (INDICON)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Character Segmentation from Handwritten Gujarati isolated words using Deep Learning\",\"authors\":\"Riya P. Javia, Mukesh M Goswami, S. Mitra\",\"doi\":\"10.1109/INDICON52576.2021.9691590\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information retrieval from scanned handwritten digital copies is a very challenging task especially in Indian scripts like Gujarati due to the presence of joint and conjuct characters as well as matras, cursive nature and varying size of the characters. There are two methods namely recognition-based and recognition-free for document image retrieval. The difference in both approaches lies in the level of segmentation. There are two levels of segmentation namely Fine and Coarse Grain. In Fine-Grain segmentation, the base character and the matras are considered as separate and are two different units of segmentation. In Coarse-Grain segmentation, the base character and matras are considered as a single unit of segmentation. The accuracy of the segmentation highly affects the result of information retrieval. The research here heads towards addressing these issues. Deep learning has been very effective in many domains but has not been used much in this domain. In this research, we propose a Coarse Grain segmentation method using the object detection model Faster RCNN and a Fine Grain segmentation method using a combination of Connected Component Analysis and Faster RCNN. The annotation of the dataset for training these models has been carried out manually using LabelImg tool.\",\"PeriodicalId\":106004,\"journal\":{\"name\":\"2021 IEEE 18th India Council International Conference (INDICON)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 18th India Council International Conference (INDICON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDICON52576.2021.9691590\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 18th India Council International Conference (INDICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDICON52576.2021.9691590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

从扫描的手写数字副本中检索信息是一项非常具有挑战性的任务,特别是在古吉拉特语等印度文字中,由于存在连体和连体字符以及马特拉斯,草书性质和字符大小的变化。文档图像检索有基于识别和无识别两种方法。这两种方法的区别在于分割的程度。有两个层次的分割,即细粒和粗粒。在细粒度分割中,基字符和矩阵被认为是独立的,是两个不同的分割单位。在粗粒度分割中,基字符和矩阵被视为一个分割单元。分割的准确性直接影响信息检索的结果。这里的研究致力于解决这些问题。深度学习在许多领域都非常有效,但在这个领域的应用并不多。在这项研究中,我们提出了一种使用目标检测模型Faster RCNN的粗粒度分割方法和一种结合连接分量分析和Faster RCNN的细粒度分割方法。使用LabelImg工具手动对训练这些模型的数据集进行标注。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Character Segmentation from Handwritten Gujarati isolated words using Deep Learning
Information retrieval from scanned handwritten digital copies is a very challenging task especially in Indian scripts like Gujarati due to the presence of joint and conjuct characters as well as matras, cursive nature and varying size of the characters. There are two methods namely recognition-based and recognition-free for document image retrieval. The difference in both approaches lies in the level of segmentation. There are two levels of segmentation namely Fine and Coarse Grain. In Fine-Grain segmentation, the base character and the matras are considered as separate and are two different units of segmentation. In Coarse-Grain segmentation, the base character and matras are considered as a single unit of segmentation. The accuracy of the segmentation highly affects the result of information retrieval. The research here heads towards addressing these issues. Deep learning has been very effective in many domains but has not been used much in this domain. In this research, we propose a Coarse Grain segmentation method using the object detection model Faster RCNN and a Fine Grain segmentation method using a combination of Connected Component Analysis and Faster RCNN. The annotation of the dataset for training these models has been carried out manually using LabelImg tool.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信