Padding Methods in Convolutional Sequence Model: An Application in Japanese Handwriting Recognition

Proceedings of the 3rd International Conference on Machine Learning and Soft Computing Pub Date : 2019-01-25 DOI:10.1145/3310986.3310998

Nguyen Tuan Nam, P. D. Hung

{"title":"Padding Methods in Convolutional Sequence Model: An Application in Japanese Handwriting Recognition","authors":"Nguyen Tuan Nam, P. D. Hung","doi":"10.1145/3310986.3310998","DOIUrl":null,"url":null,"abstract":"Today, there is a wide range of research cases about end-to-end trained and sequence-to-sequence models applied in the task of handwritten character recognition. Most of which mark the combination between convolutional neural network (CNN) as a feature extraction module and recurrent neural network (RNN) as a sequence-to-sequence module. Notably, the CNN layer can be fed with dynamic sizes of input images while the RNN layer can tolerate dynamic lengths of input data, which subsequently makes up the dynamic feature of the recognition models. However, when the number one priority is to minimize the training timespan, the models are to receive training data in the form of mini-batch, which requires resizing or padding images into an equal size instead of using original multiple-size pictures due to the fact that most of the deep learning frameworks (such as keras, tensorflow, caffe, etc.) only accept the same-size input and output in one mini-batch. Actually, this practice may lower the model dynamicity in the training process. So, the question is whether it might be a trade-off between the effectiveness (level of accuracy) and the time optimization of the model. In this paper, we will examine different impact of various padding and non-padding methods on the same model architecture for Japanese handwriting recognition before finally concluding on which method has the most reasonable training time but can produce an accuracy rate of up to 95%.","PeriodicalId":252781,"journal":{"name":"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3310986.3310998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Today, there is a wide range of research cases about end-to-end trained and sequence-to-sequence models applied in the task of handwritten character recognition. Most of which mark the combination between convolutional neural network (CNN) as a feature extraction module and recurrent neural network (RNN) as a sequence-to-sequence module. Notably, the CNN layer can be fed with dynamic sizes of input images while the RNN layer can tolerate dynamic lengths of input data, which subsequently makes up the dynamic feature of the recognition models. However, when the number one priority is to minimize the training timespan, the models are to receive training data in the form of mini-batch, which requires resizing or padding images into an equal size instead of using original multiple-size pictures due to the fact that most of the deep learning frameworks (such as keras, tensorflow, caffe, etc.) only accept the same-size input and output in one mini-batch. Actually, this practice may lower the model dynamicity in the training process. So, the question is whether it might be a trade-off between the effectiveness (level of accuracy) and the time optimization of the model. In this paper, we will examine different impact of various padding and non-padding methods on the same model architecture for Japanese handwriting recognition before finally concluding on which method has the most reasonable training time but can produce an accuracy rate of up to 95%.

查看原文本刊更多论文

卷积序列模型填充方法在日文手写识别中的应用

目前，关于端到端训练模型和序列到序列模型在手写体字符识别任务中的应用的研究案例非常广泛。其中大部分标志着卷积神经网络(CNN)作为特征提取模块和递归神经网络(RNN)作为序列到序列模块的结合。值得注意的是，CNN层可以输入图像的动态大小，而RNN层可以容忍输入数据的动态长度，这随后构成了识别模型的动态特征。然而，当第一优先级是最小化训练时间跨度时，模型将以mini-batch的形式接收训练数据，这需要将图像大小调整或填充为相等大小，而不是使用原始的多个大小的图片，因为大多数深度学习框架(如keras, tensorflow, caffe等)只接受一个mini-batch中相同大小的输入和输出。实际上，这种做法可能会降低模型在训练过程中的动态性。因此，问题是，这是否可能是模型的有效性(准确性水平)和时间优化之间的权衡。在本文中，我们将研究各种填充和非填充方法对相同模型架构的日语手写识别的不同影响，最终得出哪种方法具有最合理的训练时间，并且可以产生高达95%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 3rd International Conference on Machine Learning and Soft Computing

自引率

0.00%

发文量