使用深度学习的古吉拉特语手写孤立词的字符分割

2021 IEEE 18th India Council International Conference (INDICON) Pub Date : 2021-12-19 DOI:10.1109/INDICON52576.2021.9691590

Riya P. Javia, Mukesh M Goswami, S. Mitra

{"title":"使用深度学习的古吉拉特语手写孤立词的字符分割","authors":"Riya P. Javia, Mukesh M Goswami, S. Mitra","doi":"10.1109/INDICON52576.2021.9691590","DOIUrl":null,"url":null,"abstract":"Information retrieval from scanned handwritten digital copies is a very challenging task especially in Indian scripts like Gujarati due to the presence of joint and conjuct characters as well as matras, cursive nature and varying size of the characters. There are two methods namely recognition-based and recognition-free for document image retrieval. The difference in both approaches lies in the level of segmentation. There are two levels of segmentation namely Fine and Coarse Grain. In Fine-Grain segmentation, the base character and the matras are considered as separate and are two different units of segmentation. In Coarse-Grain segmentation, the base character and matras are considered as a single unit of segmentation. The accuracy of the segmentation highly affects the result of information retrieval. The research here heads towards addressing these issues. Deep learning has been very effective in many domains but has not been used much in this domain. In this research, we propose a Coarse Grain segmentation method using the object detection model Faster RCNN and a Fine Grain segmentation method using a combination of Connected Component Analysis and Faster RCNN. The annotation of the dataset for training these models has been carried out manually using LabelImg tool.","PeriodicalId":106004,"journal":{"name":"2021 IEEE 18th India Council International Conference (INDICON)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Character Segmentation from Handwritten Gujarati isolated words using Deep Learning\",\"authors\":\"Riya P. Javia, Mukesh M Goswami, S. Mitra\",\"doi\":\"10.1109/INDICON52576.2021.9691590\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information retrieval from scanned handwritten digital copies is a very challenging task especially in Indian scripts like Gujarati due to the presence of joint and conjuct characters as well as matras, cursive nature and varying size of the characters. There are two methods namely recognition-based and recognition-free for document image retrieval. The difference in both approaches lies in the level of segmentation. There are two levels of segmentation namely Fine and Coarse Grain. In Fine-Grain segmentation, the base character and the matras are considered as separate and are two different units of segmentation. In Coarse-Grain segmentation, the base character and matras are considered as a single unit of segmentation. The accuracy of the segmentation highly affects the result of information retrieval. The research here heads towards addressing these issues. Deep learning has been very effective in many domains but has not been used much in this domain. In this research, we propose a Coarse Grain segmentation method using the object detection model Faster RCNN and a Fine Grain segmentation method using a combination of Connected Component Analysis and Faster RCNN. The annotation of the dataset for training these models has been carried out manually using LabelImg tool.\",\"PeriodicalId\":106004,\"journal\":{\"name\":\"2021 IEEE 18th India Council International Conference (INDICON)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 18th India Council International Conference (INDICON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDICON52576.2021.9691590\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 18th India Council International Conference (INDICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDICON52576.2021.9691590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

从扫描的手写数字副本中检索信息是一项非常具有挑战性的任务，特别是在古吉拉特语等印度文字中，由于存在连体和连体字符以及马特拉斯，草书性质和字符大小的变化。文档图像检索有基于识别和无识别两种方法。这两种方法的区别在于分割的程度。有两个层次的分割，即细粒和粗粒。在细粒度分割中，基字符和矩阵被认为是独立的，是两个不同的分割单位。在粗粒度分割中，基字符和矩阵被视为一个分割单元。分割的准确性直接影响信息检索的结果。这里的研究致力于解决这些问题。深度学习在许多领域都非常有效，但在这个领域的应用并不多。在这项研究中，我们提出了一种使用目标检测模型Faster RCNN的粗粒度分割方法和一种结合连接分量分析和Faster RCNN的细粒度分割方法。使用LabelImg工具手动对训练这些模型的数据集进行标注。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Character Segmentation from Handwritten Gujarati isolated words using Deep Learning

Information retrieval from scanned handwritten digital copies is a very challenging task especially in Indian scripts like Gujarati due to the presence of joint and conjuct characters as well as matras, cursive nature and varying size of the characters. There are two methods namely recognition-based and recognition-free for document image retrieval. The difference in both approaches lies in the level of segmentation. There are two levels of segmentation namely Fine and Coarse Grain. In Fine-Grain segmentation, the base character and the matras are considered as separate and are two different units of segmentation. In Coarse-Grain segmentation, the base character and matras are considered as a single unit of segmentation. The accuracy of the segmentation highly affects the result of information retrieval. The research here heads towards addressing these issues. Deep learning has been very effective in many domains but has not been used much in this domain. In this research, we propose a Coarse Grain segmentation method using the object detection model Faster RCNN and a Fine Grain segmentation method using a combination of Connected Component Analysis and Faster RCNN. The annotation of the dataset for training these models has been carried out manually using LabelImg tool.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 18th India Council International Conference (INDICON)

自引率

0.00%

发文量