M. Hasan, Mahathir Mohammad Abir, Md. Ibrahim, M. Sayem, Sohaib Abdullah
{"title":"AIBangla:孤立孟加拉语手写基本字和复合字识别的基准数据集","authors":"M. Hasan, Mahathir Mohammad Abir, Md. Ibrahim, M. Sayem, Sohaib Abdullah","doi":"10.1109/ICBSLP47725.2019.201481","DOIUrl":null,"url":null,"abstract":"Automatic handwritten Bangla character recognition (HBCR) is a challenging problem in computer vision due to numerous variations in writing styles of an individual Bangla character and the presence of similarities in shapes among different characters. Considering the complexity of the problem, we need to develop a modern convolutional neural network (CNN) for accurate recognition, but unfortunately, at present, very few Bangla handwritten dataset contain a large number of image samples for each character suitable for training deep learning-based methods. In this paper, we present AIBangla, a new benchmark image database for isolated handwritten Bangla characters with detailed usage and a performance baseline. Our dataset contains 80,403 hand-written images on 50 Bangla basic characters and 249,911 hand-written images on 171 Bangla compound characters which were written by more than 2,000 unique writers from various institutes across Bangladesh. In addition, we have applied three leading state-of-the-art deep CNN networks on our proposed AIBangla dataset to provide baseline performance. We have achieved a maximum accuracy of 98.13% and 81.83% for basic and compound character classes respectively on the test set of the AIBangla dataset.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"AIBangla: A Benchmark Dataset for Isolated Bangla Handwritten Basic and Compound Character Recognition\",\"authors\":\"M. Hasan, Mahathir Mohammad Abir, Md. Ibrahim, M. Sayem, Sohaib Abdullah\",\"doi\":\"10.1109/ICBSLP47725.2019.201481\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic handwritten Bangla character recognition (HBCR) is a challenging problem in computer vision due to numerous variations in writing styles of an individual Bangla character and the presence of similarities in shapes among different characters. Considering the complexity of the problem, we need to develop a modern convolutional neural network (CNN) for accurate recognition, but unfortunately, at present, very few Bangla handwritten dataset contain a large number of image samples for each character suitable for training deep learning-based methods. In this paper, we present AIBangla, a new benchmark image database for isolated handwritten Bangla characters with detailed usage and a performance baseline. Our dataset contains 80,403 hand-written images on 50 Bangla basic characters and 249,911 hand-written images on 171 Bangla compound characters which were written by more than 2,000 unique writers from various institutes across Bangladesh. In addition, we have applied three leading state-of-the-art deep CNN networks on our proposed AIBangla dataset to provide baseline performance. We have achieved a maximum accuracy of 98.13% and 81.83% for basic and compound character classes respectively on the test set of the AIBangla dataset.\",\"PeriodicalId\":413077,\"journal\":{\"name\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBSLP47725.2019.201481\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSLP47725.2019.201481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
AIBangla: A Benchmark Dataset for Isolated Bangla Handwritten Basic and Compound Character Recognition
Automatic handwritten Bangla character recognition (HBCR) is a challenging problem in computer vision due to numerous variations in writing styles of an individual Bangla character and the presence of similarities in shapes among different characters. Considering the complexity of the problem, we need to develop a modern convolutional neural network (CNN) for accurate recognition, but unfortunately, at present, very few Bangla handwritten dataset contain a large number of image samples for each character suitable for training deep learning-based methods. In this paper, we present AIBangla, a new benchmark image database for isolated handwritten Bangla characters with detailed usage and a performance baseline. Our dataset contains 80,403 hand-written images on 50 Bangla basic characters and 249,911 hand-written images on 171 Bangla compound characters which were written by more than 2,000 unique writers from various institutes across Bangladesh. In addition, we have applied three leading state-of-the-art deep CNN networks on our proposed AIBangla dataset to provide baseline performance. We have achieved a maximum accuracy of 98.13% and 81.83% for basic and compound character classes respectively on the test set of the AIBangla dataset.