{"title":"Tied Spatial Transformer Networks for Digit Recognition","authors":"B. Cirstea, Laurence Likforman-Sulem","doi":"10.1109/ICFHR.2016.0102","DOIUrl":null,"url":null,"abstract":"This paper reports a new approach based on convolutional neural networks (CNNs), which uses spatial transformer networks (STNs). The approach, referred to as Tied Spatial Transformer Networks (TSTNs), consists of training a system which combines a localization CNN and a classification CNN whose weights are shared. The localization CNN is used for predicting an affine transform for the input image, which is then processed according to the predicted parameters and passed through the classification CNN. We have conducted initial experiments on the cluttered MNIST dataset of noisy digits, comparing the TSTN and STN with identical configurations of trainable parameters, but untied, as well as the classification CNN only, applied to the unprocessed images. In all these cases, we obtain better results using the TSTN. We conjecture that the TSTN provides a regularization effect, as compared to untied STNs. Further experiments seem to support this hypothesis.","PeriodicalId":194844,"journal":{"name":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","volume":"763 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2016.0102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
This paper reports a new approach based on convolutional neural networks (CNNs), which uses spatial transformer networks (STNs). The approach, referred to as Tied Spatial Transformer Networks (TSTNs), consists of training a system which combines a localization CNN and a classification CNN whose weights are shared. The localization CNN is used for predicting an affine transform for the input image, which is then processed according to the predicted parameters and passed through the classification CNN. We have conducted initial experiments on the cluttered MNIST dataset of noisy digits, comparing the TSTN and STN with identical configurations of trainable parameters, but untied, as well as the classification CNN only, applied to the unprocessed images. In all these cases, we obtain better results using the TSTN. We conjecture that the TSTN provides a regularization effect, as compared to untied STNs. Further experiments seem to support this hypothesis.