Padding Methods in Convolutional Sequence Model: An Application in Japanese Handwriting Recognition

Nguyen Tuan Nam, P. D. Hung
{"title":"Padding Methods in Convolutional Sequence Model: An Application in Japanese Handwriting Recognition","authors":"Nguyen Tuan Nam, P. D. Hung","doi":"10.1145/3310986.3310998","DOIUrl":null,"url":null,"abstract":"Today, there is a wide range of research cases about end-to-end trained and sequence-to-sequence models applied in the task of handwritten character recognition. Most of which mark the combination between convolutional neural network (CNN) as a feature extraction module and recurrent neural network (RNN) as a sequence-to-sequence module. Notably, the CNN layer can be fed with dynamic sizes of input images while the RNN layer can tolerate dynamic lengths of input data, which subsequently makes up the dynamic feature of the recognition models. However, when the number one priority is to minimize the training timespan, the models are to receive training data in the form of mini-batch, which requires resizing or padding images into an equal size instead of using original multiple-size pictures due to the fact that most of the deep learning frameworks (such as keras, tensorflow, caffe, etc.) only accept the same-size input and output in one mini-batch. Actually, this practice may lower the model dynamicity in the training process. So, the question is whether it might be a trade-off between the effectiveness (level of accuracy) and the time optimization of the model. In this paper, we will examine different impact of various padding and non-padding methods on the same model architecture for Japanese handwriting recognition before finally concluding on which method has the most reasonable training time but can produce an accuracy rate of up to 95%.","PeriodicalId":252781,"journal":{"name":"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3310986.3310998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

Today, there is a wide range of research cases about end-to-end trained and sequence-to-sequence models applied in the task of handwritten character recognition. Most of which mark the combination between convolutional neural network (CNN) as a feature extraction module and recurrent neural network (RNN) as a sequence-to-sequence module. Notably, the CNN layer can be fed with dynamic sizes of input images while the RNN layer can tolerate dynamic lengths of input data, which subsequently makes up the dynamic feature of the recognition models. However, when the number one priority is to minimize the training timespan, the models are to receive training data in the form of mini-batch, which requires resizing or padding images into an equal size instead of using original multiple-size pictures due to the fact that most of the deep learning frameworks (such as keras, tensorflow, caffe, etc.) only accept the same-size input and output in one mini-batch. Actually, this practice may lower the model dynamicity in the training process. So, the question is whether it might be a trade-off between the effectiveness (level of accuracy) and the time optimization of the model. In this paper, we will examine different impact of various padding and non-padding methods on the same model architecture for Japanese handwriting recognition before finally concluding on which method has the most reasonable training time but can produce an accuracy rate of up to 95%.
卷积序列模型填充方法在日文手写识别中的应用
目前,关于端到端训练模型和序列到序列模型在手写体字符识别任务中的应用的研究案例非常广泛。其中大部分标志着卷积神经网络(CNN)作为特征提取模块和递归神经网络(RNN)作为序列到序列模块的结合。值得注意的是,CNN层可以输入图像的动态大小,而RNN层可以容忍输入数据的动态长度,这随后构成了识别模型的动态特征。然而,当第一优先级是最小化训练时间跨度时,模型将以mini-batch的形式接收训练数据,这需要将图像大小调整或填充为相等大小,而不是使用原始的多个大小的图片,因为大多数深度学习框架(如keras, tensorflow, caffe等)只接受一个mini-batch中相同大小的输入和输出。实际上,这种做法可能会降低模型在训练过程中的动态性。因此,问题是,这是否可能是模型的有效性(准确性水平)和时间优化之间的权衡。在本文中,我们将研究各种填充和非填充方法对相同模型架构的日语手写识别的不同影响,最终得出哪种方法具有最合理的训练时间,并且可以产生高达95%的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信