MatriVasha: Bangla Handwritten Compound Character Dataset and Recognition

arXiv: Computer Vision and Pattern Recognition Pub Date : 2021-08-06 DOI:10.17632/V39PC2G2WP.1

J. Ferdous, Suvrajit Karmaker, AKM SHAHARIAR AZAD RABBY, S. A. Hossain

{"title":"MatriVasha: Bangla Handwritten Compound Character Dataset and Recognition","authors":"J. Ferdous, Suvrajit Karmaker, AKM SHAHARIAR AZAD RABBY, S. A. Hossain","doi":"10.17632/V39PC2G2WP.1","DOIUrl":null,"url":null,"abstract":"At present, recognition of the Bangla handwriting compound character has been an essential issue for many years. In recent years there have been application-based researches in machine learning, and deep learning, which is gained interest, and most notably is handwriting recognition because it has a tremendous application such as Bangla OCR. MatrriVasha, the project which can recognize Bangla, handwritten several compound characters. Currently, compound character recognition is an important topic due to its variant application, and helps to create old forms, and information digitization with reliability. But unfortunately, there is a lack of a comprehensive dataset that can categorize all types of Bangla compound characters. MatrriVasha is an attempt to align compound character, and it's challenging because each person has a unique style of writing shapes. After all, MatrriVasha has proposed a dataset that intends to recognize Bangla 120(one hundred twenty) compound characters that consist of 2552(two thousand five hundred fifty-two) isolated handwritten characters written unique writers which were collected from within Bangladesh. This dataset faced problems in terms of the district, age, and gender-based written related research because the samples were collected that includes a verity of the district, age group, and the equal number of males, and females. As of now, our proposed dataset is so far the most extensive dataset for Bangla compound characters. It is intended to frame the acknowledgment technique for handwritten Bangla compound character. In the future, this dataset will be made publicly available to help to widen the research.","PeriodicalId":185904,"journal":{"name":"arXiv: Computer Vision and Pattern Recognition","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17632/V39PC2G2WP.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

At present, recognition of the Bangla handwriting compound character has been an essential issue for many years. In recent years there have been application-based researches in machine learning, and deep learning, which is gained interest, and most notably is handwriting recognition because it has a tremendous application such as Bangla OCR. MatrriVasha, the project which can recognize Bangla, handwritten several compound characters. Currently, compound character recognition is an important topic due to its variant application, and helps to create old forms, and information digitization with reliability. But unfortunately, there is a lack of a comprehensive dataset that can categorize all types of Bangla compound characters. MatrriVasha is an attempt to align compound character, and it's challenging because each person has a unique style of writing shapes. After all, MatrriVasha has proposed a dataset that intends to recognize Bangla 120(one hundred twenty) compound characters that consist of 2552(two thousand five hundred fifty-two) isolated handwritten characters written unique writers which were collected from within Bangladesh. This dataset faced problems in terms of the district, age, and gender-based written related research because the samples were collected that includes a verity of the district, age group, and the equal number of males, and females. As of now, our proposed dataset is so far the most extensive dataset for Bangla compound characters. It is intended to frame the acknowledgment technique for handwritten Bangla compound character. In the future, this dataset will be made publicly available to help to widen the research.

查看原文本刊更多论文

MatriVasha:孟加拉语手写复合字数据集和识别

目前，孟加拉文手写体复合字的识别一直是一个重要的问题。近年来，在机器学习和深度学习方面有了基于应用的研究，其中最引人注目的是手写识别，因为它有巨大的应用，如孟加拉语OCR。这个项目可以识别孟加拉语，手写几个复合字。复合字识别由于其应用的多样化，有助于创建旧的表单，实现可靠的信息数字化，是当前研究的一个重要课题。但遗憾的是，目前还没有一个全面的数据集可以对所有类型的孟加拉语复合字进行分类。matririvasha是一种将复合字对齐的尝试，它具有挑战性，因为每个人都有独特的书写形状风格。毕竟，matririvasha已经提出了一个数据集，旨在识别孟加拉国120(120)个复合字符，这些复合字符由2552(两千五百五十二)个独立的手写字符组成，这些字符来自孟加拉国境内的独特作者。该数据集在地区、年龄和基于性别的书面相关研究方面面临问题，因为收集的样本包括地区、年龄组的真实性，男性和女性的数量相等。到目前为止，我们提出的数据集是迄今为止最广泛的孟加拉语复合字数据集。本文旨在构建手写体孟加拉复合字的识别技术。在未来，这个数据集将被公开，以帮助扩大研究范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv: Computer Vision and Pattern Recognition

自引率

0.00%

发文量