Text Extraction with Optimal Bi-LSTM

Computers, materials & continua Pub Date : 2023-01-01 DOI:10.32604/cmc.2023.039528

Bahera H. Nayef, Siti Norul Huda Sheikh Abdullah, Rossilawati Sulaiman, Ashwaq Mukred Saeed

{"title":"Text Extraction with Optimal Bi-LSTM","authors":"Bahera H. Nayef, Siti Norul Huda Sheikh Abdullah, Rossilawati Sulaiman, Ashwaq Mukred Saeed","doi":"10.32604/cmc.2023.039528","DOIUrl":null,"url":null,"abstract":"Text extraction from images using the traditional techniques of image collecting, and pattern recognition using machine learning consume time due to the amount of extracted features from the images. Deep Neural Networks introduce effective solutions to extract text features from images using a few techniques and the ability to train large datasets of images with significant results. This study proposes using Dual Maxpooling and concatenating convolution Neural Networks (CNN) layers with the activation functions Relu and the Optimized Leaky Relu (OLRelu). The proposed method works by dividing the word image into slices that contain characters. Then pass them to deep learning layers to extract feature maps and reform the predicted words. Bidirectional Short Memory (BiLSTM) layers extract more compelling features and link the time sequence from forward and backward directions during the training phase. The Connectionist Temporal Classification (CTC) function calcifies the training and validation loss rates. In addition to decoding the extracted feature to reform characters again and linking them according to their time sequence. The proposed model performance is evaluated using training and validation loss errors on the Mjsynth and Integrated Argument Mining Tasks (IAM) datasets. The result of IAM was 2.09% for the average loss errors with the proposed dual Maxpooling and OLRelu. In the Mjsynth dataset, the best validation loss rate shrunk to 2.2% by applying concatenating CNN layers, and Relu.","PeriodicalId":93535,"journal":{"name":"Computers, materials & continua","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers, materials & continua","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32604/cmc.2023.039528","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Text extraction from images using the traditional techniques of image collecting, and pattern recognition using machine learning consume time due to the amount of extracted features from the images. Deep Neural Networks introduce effective solutions to extract text features from images using a few techniques and the ability to train large datasets of images with significant results. This study proposes using Dual Maxpooling and concatenating convolution Neural Networks (CNN) layers with the activation functions Relu and the Optimized Leaky Relu (OLRelu). The proposed method works by dividing the word image into slices that contain characters. Then pass them to deep learning layers to extract feature maps and reform the predicted words. Bidirectional Short Memory (BiLSTM) layers extract more compelling features and link the time sequence from forward and backward directions during the training phase. The Connectionist Temporal Classification (CTC) function calcifies the training and validation loss rates. In addition to decoding the extracted feature to reform characters again and linking them according to their time sequence. The proposed model performance is evaluated using training and validation loss errors on the Mjsynth and Integrated Argument Mining Tasks (IAM) datasets. The result of IAM was 2.09% for the average loss errors with the proposed dual Maxpooling and OLRelu. In the Mjsynth dataset, the best validation loss rate shrunk to 2.2% by applying concatenating CNN layers, and Relu.

查看原文本刊更多论文

基于最优Bi-LSTM的文本提取

使用传统的图像收集技术从图像中提取文本，以及使用机器学习进行模式识别，由于从图像中提取的特征数量很大，因此会消耗时间。深度神经网络引入了有效的解决方案，使用一些技术从图像中提取文本特征，并能够训练具有显著结果的大型图像数据集。本研究提出使用双Maxpooling和连接卷积神经网络(CNN)层与激活函数Relu和优化泄漏Relu (OLRelu)。该方法通过将单词图像分成包含字符的切片来工作。然后将它们传递给深度学习层来提取特征图并对预测的单词进行改造。双向短时记忆(BiLSTM)层在训练阶段提取更多引人注目的特征，并从前后方向连接时间序列。连接主义时间分类(CTC)函数将训练和验证的损失率降低。除了对提取的特征进行解码外，还可以对字符进行重新改造，并根据它们的时间顺序进行连接。使用Mjsynth和集成参数挖掘任务(IAM)数据集上的训练和验证损失误差来评估所提出的模型性能。对于所提出的双Maxpooling和OLRelu的平均丢失误差，IAM的结果为2.09%。在Mjsynth数据集中，通过应用CNN层的连接，最佳验证损失率降至2.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers, materials & continua

自引率

0.00%

发文量