{"title":"Padding Methods in Convolutional Sequence Model: An Application in Japanese Handwriting Recognition","authors":"Nguyen Tuan Nam, P. D. Hung","doi":"10.1145/3310986.3310998","DOIUrl":null,"url":null,"abstract":"Today, there is a wide range of research cases about end-to-end trained and sequence-to-sequence models applied in the task of handwritten character recognition. Most of which mark the combination between convolutional neural network (CNN) as a feature extraction module and recurrent neural network (RNN) as a sequence-to-sequence module. Notably, the CNN layer can be fed with dynamic sizes of input images while the RNN layer can tolerate dynamic lengths of input data, which subsequently makes up the dynamic feature of the recognition models. However, when the number one priority is to minimize the training timespan, the models are to receive training data in the form of mini-batch, which requires resizing or padding images into an equal size instead of using original multiple-size pictures due to the fact that most of the deep learning frameworks (such as keras, tensorflow, caffe, etc.) only accept the same-size input and output in one mini-batch. Actually, this practice may lower the model dynamicity in the training process. So, the question is whether it might be a trade-off between the effectiveness (level of accuracy) and the time optimization of the model. In this paper, we will examine different impact of various padding and non-padding methods on the same model architecture for Japanese handwriting recognition before finally concluding on which method has the most reasonable training time but can produce an accuracy rate of up to 95%.","PeriodicalId":252781,"journal":{"name":"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3310986.3310998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
Today, there is a wide range of research cases about end-to-end trained and sequence-to-sequence models applied in the task of handwritten character recognition. Most of which mark the combination between convolutional neural network (CNN) as a feature extraction module and recurrent neural network (RNN) as a sequence-to-sequence module. Notably, the CNN layer can be fed with dynamic sizes of input images while the RNN layer can tolerate dynamic lengths of input data, which subsequently makes up the dynamic feature of the recognition models. However, when the number one priority is to minimize the training timespan, the models are to receive training data in the form of mini-batch, which requires resizing or padding images into an equal size instead of using original multiple-size pictures due to the fact that most of the deep learning frameworks (such as keras, tensorflow, caffe, etc.) only accept the same-size input and output in one mini-batch. Actually, this practice may lower the model dynamicity in the training process. So, the question is whether it might be a trade-off between the effectiveness (level of accuracy) and the time optimization of the model. In this paper, we will examine different impact of various padding and non-padding methods on the same model architecture for Japanese handwriting recognition before finally concluding on which method has the most reasonable training time but can produce an accuracy rate of up to 95%.