一种基于视觉变换的混合神经结构，用于自动手写体孟加拉文字符识别和盲文转换

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-09-27 DOI:10.1016/j.knosys.2025.114546

Touseef Saleh Bin Ahmed , Tawhidur Rahman , Shammo Biswas , Saifur Rahman Sabuj , Mohammed Belal Bhuian , Mohammad Ali Moni , Md Ashraful Alam

{"title":"一种基于视觉变换的混合神经结构，用于自动手写体孟加拉文字符识别和盲文转换","authors":"Touseef Saleh Bin Ahmed , Tawhidur Rahman , Shammo Biswas , Saifur Rahman Sabuj , Mohammed Belal Bhuian , Mohammad Ali Moni , Md Ashraful Alam","doi":"10.1016/j.knosys.2025.114546","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid advancement of technology has led to notable changes in the current educational system. Nevertheless, there are still relatively few assisting aids that can help in teaching individuals with disabilities, such as those who are blind or visually impaired. An effective teaching strategy for those who are blind or visually impaired is braille. Although it has been digitized to produce an electronic version, handwritten characters are not considered in those versions. Studies on English character recognition have shown high accuracy, which is not the case with Bangla character recognition. We present an automated system that converts handwritten Bangla characters to braille using novel hybrid deep neural network architectures. Our approach begins with a Character Quality Assessment Framework (CQAF), which employs adaptive thresholds and comprehensive quality metrics designed explicitly for Bangla script characteristics. Building upon this foundation, we present two architectures. HybridNet-L represents our initial multi-stream design, while HybridNet-S is a redesigned lightweight variant that reduces parameters and achieves superior accuracy, making it the primary contribution of this work. To complete the system, we implement a comprehensive accessibility solution featuring real-time braille hardware interface and text-to-speech capabilities. The model effectively processes all 84 Bangla character classes including vowels, consonants, numerics, and compound characters. Extensive evaluation against seven baseline models demonstrates that our HybridNet-S achieves superior performance with 95.80% validation accuracy while maintaining computational efficiency suitable for embedded deployment. Statistical validation and ablation studies confirm the robustness and effectiveness of our multi-stream architecture for practical assistive technology applications.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114546"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A vision transformer-based hybrid neural architecture for automated handwritten Bangla character recognition and braille conversion\",\"authors\":\"Touseef Saleh Bin Ahmed , Tawhidur Rahman , Shammo Biswas , Saifur Rahman Sabuj , Mohammed Belal Bhuian , Mohammad Ali Moni , Md Ashraful Alam\",\"doi\":\"10.1016/j.knosys.2025.114546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rapid advancement of technology has led to notable changes in the current educational system. Nevertheless, there are still relatively few assisting aids that can help in teaching individuals with disabilities, such as those who are blind or visually impaired. An effective teaching strategy for those who are blind or visually impaired is braille. Although it has been digitized to produce an electronic version, handwritten characters are not considered in those versions. Studies on English character recognition have shown high accuracy, which is not the case with Bangla character recognition. We present an automated system that converts handwritten Bangla characters to braille using novel hybrid deep neural network architectures. Our approach begins with a Character Quality Assessment Framework (CQAF), which employs adaptive thresholds and comprehensive quality metrics designed explicitly for Bangla script characteristics. Building upon this foundation, we present two architectures. HybridNet-L represents our initial multi-stream design, while HybridNet-S is a redesigned lightweight variant that reduces parameters and achieves superior accuracy, making it the primary contribution of this work. To complete the system, we implement a comprehensive accessibility solution featuring real-time braille hardware interface and text-to-speech capabilities. The model effectively processes all 84 Bangla character classes including vowels, consonants, numerics, and compound characters. Extensive evaluation against seven baseline models demonstrates that our HybridNet-S achieves superior performance with 95.80% validation accuracy while maintaining computational efficiency suitable for embedded deployment. Statistical validation and ablation studies confirm the robustness and effectiveness of our multi-stream architecture for practical assistive technology applications.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114546\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015850\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015850","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

科技的飞速发展导致了现行教育制度的显著变化。然而，仍然有相对较少的辅助教具，可以帮助教育残疾人，如盲人或视力受损的人。对于那些盲人或视力受损的人来说，盲文是一种有效的教学策略。虽然它已被数字化，以产生一个电子版本，手写字符不考虑在这些版本。研究表明，英语字符识别的准确率很高，而孟加拉语字符识别的准确率却不高。我们提出了一种使用新型混合深度神经网络架构将手写体孟加拉文转换为盲文的自动化系统。我们的方法从字符质量评估框架（CQAF）开始，它采用了为孟加拉文字特征明确设计的自适应阈值和综合质量度量。在此基础上，我们提出了两种架构。HybridNet-L代表了我们最初的多流设计，而HybridNet-S是一个重新设计的轻量级变体，减少了参数并实现了卓越的精度，使其成为这项工作的主要贡献。为了完成系统，我们实现了一个全面的无障碍解决方案，具有实时盲文硬件接口和文本到语音的功能。该模型有效地处理所有84种孟加拉语字符类，包括元音、辅音、数字和复合字符。对七个基线模型的广泛评估表明，我们的HybridNet-S在保持适合嵌入式部署的计算效率的同时，实现了95.80%的验证准确率的卓越性能。统计验证和消融研究证实了我们的多流架构在实际辅助技术应用中的鲁棒性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A vision transformer-based hybrid neural architecture for automated handwritten Bangla character recognition and braille conversion

The rapid advancement of technology has led to notable changes in the current educational system. Nevertheless, there are still relatively few assisting aids that can help in teaching individuals with disabilities, such as those who are blind or visually impaired. An effective teaching strategy for those who are blind or visually impaired is braille. Although it has been digitized to produce an electronic version, handwritten characters are not considered in those versions. Studies on English character recognition have shown high accuracy, which is not the case with Bangla character recognition. We present an automated system that converts handwritten Bangla characters to braille using novel hybrid deep neural network architectures. Our approach begins with a Character Quality Assessment Framework (CQAF), which employs adaptive thresholds and comprehensive quality metrics designed explicitly for Bangla script characteristics. Building upon this foundation, we present two architectures. HybridNet-L represents our initial multi-stream design, while HybridNet-S is a redesigned lightweight variant that reduces parameters and achieves superior accuracy, making it the primary contribution of this work. To complete the system, we implement a comprehensive accessibility solution featuring real-time braille hardware interface and text-to-speech capabilities. The model effectively processes all 84 Bangla character classes including vowels, consonants, numerics, and compound characters. Extensive evaluation against seven baseline models demonstrates that our HybridNet-S achieves superior performance with 95.80% validation accuracy while maintaining computational efficiency suitable for embedded deployment. Statistical validation and ablation studies confirm the robustness and effectiveness of our multi-stream architecture for practical assistive technology applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.