基于卷积神经网络算法的多字体英文电子处方识别

IF 2.3 Q3 Computer Science

Bio-Algorithms and Med-Systems Pub Date : 2020-09-01 DOI:10.1515/BAMS-2020-0021

M. Mohammed, E. Mohammed, Mohammed S. Jarjees

{"title":"基于卷积神经网络算法的多字体英文电子处方识别","authors":"M. Mohammed, E. Mohammed, Mohammed S. Jarjees","doi":"10.1515/BAMS-2020-0021","DOIUrl":null,"url":null,"abstract":"Abstract The printed character recognition is an efficient and automatic method for inputting information to a computer nowadays that is used to translate the printed or handwritten images into an editable and readable text file. This paper aims to recognize a multifont and multisize of the English language printed word for a smart pharmacy purpose. The recognition system has been based on a convolution neural network (CNN) approach where line, word, and character are separately corrected, and then each of the separated characters is fed into the CNN algorithm for recognition purposes. The OpenCV open-source library has been used for preprocessing, which can segment English characters accurately and efficiently, and for recognition, the Keras library with the backend of TensorFlow has been used. The training and testing data sets have been designed to include 23 different fonts with six different sizes. The CNN algorithm achieves the highest accuracy of 96.6% comparing to the other state-of-the-art machine learning methods. The higher classification accuracy of the CNN approach shows that this type of algorithm is ideal for the English language printed word recognition. The highest error rate after testing the system using English electronic prescribing written with all proposed font-types is 0.23% in Georgia font.","PeriodicalId":42620,"journal":{"name":"Bio-Algorithms and Med-Systems","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/BAMS-2020-0021","citationCount":"8","resultStr":"{\"title\":\"Recognition of multifont English electronic prescribing based on convolution neural network algorithm\",\"authors\":\"M. Mohammed, E. Mohammed, Mohammed S. Jarjees\",\"doi\":\"10.1515/BAMS-2020-0021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The printed character recognition is an efficient and automatic method for inputting information to a computer nowadays that is used to translate the printed or handwritten images into an editable and readable text file. This paper aims to recognize a multifont and multisize of the English language printed word for a smart pharmacy purpose. The recognition system has been based on a convolution neural network (CNN) approach where line, word, and character are separately corrected, and then each of the separated characters is fed into the CNN algorithm for recognition purposes. The OpenCV open-source library has been used for preprocessing, which can segment English characters accurately and efficiently, and for recognition, the Keras library with the backend of TensorFlow has been used. The training and testing data sets have been designed to include 23 different fonts with six different sizes. The CNN algorithm achieves the highest accuracy of 96.6% comparing to the other state-of-the-art machine learning methods. The higher classification accuracy of the CNN approach shows that this type of algorithm is ideal for the English language printed word recognition. The highest error rate after testing the system using English electronic prescribing written with all proposed font-types is 0.23% in Georgia font.\",\"PeriodicalId\":42620,\"journal\":{\"name\":\"Bio-Algorithms and Med-Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/BAMS-2020-0021\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bio-Algorithms and Med-Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/BAMS-2020-0021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bio-Algorithms and Med-Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/BAMS-2020-0021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 8

摘要

摘要印刷字符识别是一种高效、自动的计算机信息输入方法，用于将印刷或手写图像转换为可编辑、可读的文本文件。本文旨在识别多字体、多尺寸的英语印刷词，以实现智能药房的目的。识别系统基于卷积神经网络（CNN）方法，其中行、单词和字符被分别校正，然后每个分离的字符被输入到CNN算法中用于识别目的。OpenCV开源库已用于预处理，可以准确高效地分割英文字符；识别方面，使用了带有TensorFlow后端的Keras库。训练和测试数据集被设计为包括23种不同字体和6种不同大小。与其他最先进的机器学习方法相比，CNN算法实现了96.6%的最高精度。CNN方法具有较高的分类精度，表明该算法是英语印刷字识别的理想算法。在使用所有拟议字体类型编写的英语电子处方测试系统后，乔治亚字体的最高错误率为0.23%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Recognition of multifont English electronic prescribing based on convolution neural network algorithm

Abstract The printed character recognition is an efficient and automatic method for inputting information to a computer nowadays that is used to translate the printed or handwritten images into an editable and readable text file. This paper aims to recognize a multifont and multisize of the English language printed word for a smart pharmacy purpose. The recognition system has been based on a convolution neural network (CNN) approach where line, word, and character are separately corrected, and then each of the separated characters is fed into the CNN algorithm for recognition purposes. The OpenCV open-source library has been used for preprocessing, which can segment English characters accurately and efficiently, and for recognition, the Keras library with the backend of TensorFlow has been used. The training and testing data sets have been designed to include 23 different fonts with six different sizes. The CNN algorithm achieves the highest accuracy of 96.6% comparing to the other state-of-the-art machine learning methods. The higher classification accuracy of the CNN approach shows that this type of algorithm is ideal for the English language printed word recognition. The highest error rate after testing the system using English electronic prescribing written with all proposed font-types is 0.23% in Georgia font.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bio-Algorithms and Med-Systems MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

3.80

自引率

0.00%

发文量

期刊介绍： The journal Bio-Algorithms and Med-Systems (BAMS), edited by the Jagiellonian University Medical College, provides a forum for the exchange of information in the interdisciplinary fields of computational methods applied in medicine, presenting new algorithms and databases that allows the progress in collaborations between medicine, informatics, physics, and biochemistry. Projects linking specialists representing these disciplines are welcome to be published in this Journal. Articles in BAMS are published in English. Topics Bioinformatics Systems biology Telemedicine E-Learning in Medicine Patient''s electronic record Image processing Medical databases.