{"title":"基于快速分布平滑的CTC正则化算法在文本识别中的应用","authors":"Ryohei Tanaka, Soichiro Ono, Akio Furuhata","doi":"10.1109/ICDAR.2019.00056","DOIUrl":null,"url":null,"abstract":"Many recent text recognition studies achieved successful performance by applying a sequential-label prediction framework such as connectionist temporal classification. Meanwhile, regularization is known to be essential to avoid overfitting when training deep neural networks. Regularization techniques that allow for semi-supervised learning have a greater impact than those that do not. Among widely researched single-label regularization techniques, virtual adversarial training (VAT) performs successfully by smoothing posterior distributions around training data points. However, VAT is almost solely applied to single-label prediction tasks, not to sequential-label prediction tasks. This is because the number of possible candidates in the label sequence exponentially increases with the sequence length, making it impractical to calculate posterior distributions and the divergence between them. Investigating this problem, we have found that there is an easily computable upper bound for divergence. Here, we propose fast distributional smoothing (FDS) as a method for drastically reducing computational costs by minimizing this upper bound. FDS allows regularization at practical computational costs in both supervised and semi-supervised learning. An experiment under simple settings confirmed that upper-bound minimization decreases divergence. Experiments also show that FDS improves scene text recognition performance and enhances state-of-the-art regularization performance. Furthermore, experiments show that FDS enables efficient semi-supervised learning in sequential-label prediction tasks and that it outperforms a conventional semi-supervised method.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Fast Distributional Smoothing for Regularization in CTC Applied to Text Recognition\",\"authors\":\"Ryohei Tanaka, Soichiro Ono, Akio Furuhata\",\"doi\":\"10.1109/ICDAR.2019.00056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many recent text recognition studies achieved successful performance by applying a sequential-label prediction framework such as connectionist temporal classification. Meanwhile, regularization is known to be essential to avoid overfitting when training deep neural networks. Regularization techniques that allow for semi-supervised learning have a greater impact than those that do not. Among widely researched single-label regularization techniques, virtual adversarial training (VAT) performs successfully by smoothing posterior distributions around training data points. However, VAT is almost solely applied to single-label prediction tasks, not to sequential-label prediction tasks. This is because the number of possible candidates in the label sequence exponentially increases with the sequence length, making it impractical to calculate posterior distributions and the divergence between them. Investigating this problem, we have found that there is an easily computable upper bound for divergence. Here, we propose fast distributional smoothing (FDS) as a method for drastically reducing computational costs by minimizing this upper bound. FDS allows regularization at practical computational costs in both supervised and semi-supervised learning. An experiment under simple settings confirmed that upper-bound minimization decreases divergence. Experiments also show that FDS improves scene text recognition performance and enhances state-of-the-art regularization performance. Furthermore, experiments show that FDS enables efficient semi-supervised learning in sequential-label prediction tasks and that it outperforms a conventional semi-supervised method.\",\"PeriodicalId\":325437,\"journal\":{\"name\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2019.00056\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fast Distributional Smoothing for Regularization in CTC Applied to Text Recognition
Many recent text recognition studies achieved successful performance by applying a sequential-label prediction framework such as connectionist temporal classification. Meanwhile, regularization is known to be essential to avoid overfitting when training deep neural networks. Regularization techniques that allow for semi-supervised learning have a greater impact than those that do not. Among widely researched single-label regularization techniques, virtual adversarial training (VAT) performs successfully by smoothing posterior distributions around training data points. However, VAT is almost solely applied to single-label prediction tasks, not to sequential-label prediction tasks. This is because the number of possible candidates in the label sequence exponentially increases with the sequence length, making it impractical to calculate posterior distributions and the divergence between them. Investigating this problem, we have found that there is an easily computable upper bound for divergence. Here, we propose fast distributional smoothing (FDS) as a method for drastically reducing computational costs by minimizing this upper bound. FDS allows regularization at practical computational costs in both supervised and semi-supervised learning. An experiment under simple settings confirmed that upper-bound minimization decreases divergence. Experiments also show that FDS improves scene text recognition performance and enhances state-of-the-art regularization performance. Furthermore, experiments show that FDS enables efficient semi-supervised learning in sequential-label prediction tasks and that it outperforms a conventional semi-supervised method.