{"title":"IBN-STR: A Robust Text Recognizer for Irregular Text in Natural Scenes","authors":"Xiaoqian Li, Jie Liu, Guixuan Zhang, Shuwu Zhang","doi":"10.1109/ICPR48806.2021.9412775","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412775","url":null,"abstract":"Although text recognition methods based on deep neural networks have promising performance, there are still challenges due to the variety of text styles, perspective distortion, text with large curvature, and so on. To obtain a robust text recognizer, we have improved the performance from two aspects: data aspect and feature representation aspect. In terms of data, we transform the input images into S-shape distorted images in order to increase the diversity of training data. Besides, we explore the effects of different training data. In terms of feature representation, the combination of instance normalization and batch normalization improves the model's capacity and generalization ability. This paper proposes a robust scene text recognizer IBN-STR, which is an attention-based model. Through extensive experiments, the model analysis and comparison have been carried out from the aspects of data and feature representation, and the effectiveness of IBN-STR on both regular and irregular text instances has been verified. Furthermore, IBN-STR is an end-to-end recognition system that can achieve state-of-the-art performance.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"25 1","pages":"9522-9528"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85100118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuhao Qiu, Yao Guo, Chuang Zhu, Wenli Zhou, Huang Chen
{"title":"Attention Based Multi-Instance Thyroid Cytopathological Diagnosis with Multi-Scale Feature Fusion","authors":"Shuhao Qiu, Yao Guo, Chuang Zhu, Wenli Zhou, Huang Chen","doi":"10.1109/ICPR48806.2021.9413184","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413184","url":null,"abstract":"In recent years, deep learning has been popular in combining with cytopathology diagnosis. Using the whole slide images (WSI) scanned by electronic scanners at clinics, researchers have developed many algorithms to classify the slide (benign or malignant). However, the key area that support the diagnosis result can be relatively small in a thyroid WSI, and only the global label can be acquired, which make the direct use of the strongly supervised learning framework infeasible. What's more, because the clinical diagnosis of the thyroid cells requires the use of visual features in different scales, a generic feature extraction way may not achieve good performance. In this paper, we propose a weakly supervised multi-instance learning framework based on attention mechanism with multi-scale feature fusion (MSF) using convolutional neural network (CNN) for thyroid cytopathological diagnosis. We take each WSI as a bag, each bag contains multiple instances which are the different regions of the WSI, our framework is trained to learn the key area automatically and make the classification. We also propose a feature fusion structure, merge the low-level features into the final feature map and add an instance-level attention module in it, which improves the classification accuracy. Our model is trained and tested on the collected clinical data, reaches the accuracy of 93.2 %, which outperforms the other existing methods. We also tested our model on a public histopathology dataset and achieves better result than the state-of-the-art deep multi-instance method.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"48 1","pages":"3536-3541"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85985026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Attribute Regression Network for Face Reconstruction","authors":"Xiangzheng Li, Suping Wu","doi":"10.1109/ICPR48806.2021.9412668","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412668","url":null,"abstract":"In this paper, we propose a multi-attribute regression network (MARN) to investigate the problem of face reconstruction, especially in challenging cases when faces undergo large variations including severe poses, extreme expressions, and partial occlusions in unconstrained environments. The traditional 3DMM parametric regression method does not distinguish the learning of identity, expression, and attitude attributes, resulting in lacking geometric details in the reconstructed face. We propose to learn a face multi-attribute features during 3D face reconstruction from single 2D images. Our MARN enables the network to better extract the feature information of face identity, expression, and pose attributes. We introduce three loss functions to constrain the above three face attributes respectively. At the same time, we carefully design the geometric contour constraint loss function, using the constraints of sparse 2D face landmarks to improve the reconstructed geometric contour information. The experimental results show that our MARN has achieved significant improvements in 3D face reconstruction and face alignment on the AFLW2000-3D and AFLW datasets.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"463 1","pages":"7226-7233"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76609785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Tran, Huei-Yung Lin, Hsiao-Wei Liu, Fang-Jie Jang, Chun-Han Tseng
{"title":"BiLuNet: A Multi-path Network for Semantic Segmentation on X-ray Images","authors":"V. Tran, Huei-Yung Lin, Hsiao-Wei Liu, Fang-Jie Jang, Chun-Han Tseng","doi":"10.1109/ICPR48806.2021.9412027","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412027","url":null,"abstract":"Semantic segmentation and shape detection of lumbar vertebrae, sacrum, and femoral heads from clinical X-ray images are important and challenging tasks. In this paper, we propose a new multi-path convolutional neural network, BiLuNet, for semantic segmentation on X-ray images. The network is capable of medical image segmentation with very limited training data. With the shape fitting of the bones, we can identify the location of the target regions very accurately for lumbar vertebra inspection. We collected our dataset and annotated by doctors for model training and performance evaluation. Compared to the state-of-the-art methods, the proposed technique provides better mIoUs and higher success rates with the same training data. The experimental results have demonstrated the feasibility of our network to perform semantic segmentation for lumbar vertebrae, sacrum, and femoral heads. Code is available at: https://github.com/LuanTran07/BiLUnet-Lumbar-Spine.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"34 1","pages":"10034-10041"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76989356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuhang Zhang, Hongshuai Ren, Jiexia Ye, Xitong Gao, Yang Wang, Kejiang Ye, Chengzhong Xu
{"title":"AOAM: Automatic Optimization of Adjacency Matrix for Graph Convolutional Network","authors":"Yuhang Zhang, Hongshuai Ren, Jiexia Ye, Xitong Gao, Yang Wang, Kejiang Ye, Chengzhong Xu","doi":"10.1109/ICPR48806.2021.9412046","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412046","url":null,"abstract":"Graph Convolutional Network (GCN) is adopted to tackle the problem of convolution operation in non-Euclidean space. Previous works on GCN have made some progress, however, one of their limitations is that the design of Adjacency Matrix (AM) as GCN input requires domain knowledge and such process is cumbersome, tedious and error-prone. In addition, entries of a fixed Adjacency Matrix are generally designed as binary values (i.e., ones and zeros) which can not reflect the real relationship between nodes. Meanwhile, many applications require a weighted and dynamic Adjacency Matrix instead of an unweighted and fixed AM, and there are few works focusing on designing a more flexible Adjacency Matrix. To that end, we propose an end-to-end algorithm to improve the GCN performance by focusing on the Adjacency Matrix. We first provide a calculation method called node information entropy to update the matrix. Then, we perform the search strategy in a continuous space and introduce the Deep Deterministic Policy Gradient (DDPG) method to overcome the drawback of the discrete space search. Finally, we integrate the GCN and reinforcement learning into an end-to-end framework. Our method can automatically define the Adjacency Matrix without prior knowledge. At the same time, the proposed approach can deal with any size of the matrix and provide a better AM for network. Four popular datasets are selected to evaluate the capability of our algorithm. The method in this paper achieves the state-of-the-art performance on Cora and Pubmed datasets, with the accuracy of 84.6% and 81.6% respectively.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"223 1","pages":"5130-5136"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76992319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Augmentation of Small Training Data Using GANs for Enhancing the Performance of Image Classification","authors":"S. Hung, J. Q. Gan","doi":"10.1109/ICPR48806.2021.9412399","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412399","url":null,"abstract":"It is difficult to achieve high performance without sufficient training data for deep convolutional neural networks (DCNNs) to learn. Data augmentation plays an important role in improving robustness and preventing overfitting in machine learning for many applications such as image classification. In this paper, a novel method for data augmentation is proposed to solve the problem of machine learning with small training datasets. The proposed method can synthesise similar images with rich diversity from only a single original training sample to increase the number of training data by using generative adversarial networks (GANs). It is expected that the synthesised images possess class-informative features, which may be in the validation or testing data but not in the training data due to that the training dataset is small, and thus they can be effective as augmented training data to improve the classification accuracy of DCNNs. The experimental results have demonstrated that the proposed method with a novel GAN framework for image training data augmentation can significantly enhance the classification performance of DCNNs for applications where original training data is limited.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"5 1","pages":"3350-3356"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80908028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruijie Yan, Liangrui Peng, Shanyu Xiao, Gang Yao, Jaesik Min
{"title":"MEAN: Multi - Element Attention Network for Scene Text Recognition","authors":"Ruijie Yan, Liangrui Peng, Shanyu Xiao, Gang Yao, Jaesik Min","doi":"10.1109/ICPR48806.2021.9413166","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413166","url":null,"abstract":"Scene text recognition is a challenging problem due to the wide variances in contents, styles, orientations, and image quality of text instances in natural scene images. To learn the intrinsic representation of scene texts, a novel multi-element attention (MEA) mechanism is proposed to exploit geometric structures from local to global levels in feature maps extracted from a scene text image. The MEA mechanism is a generalized form of self-attention technique. The elements in feature maps are taken as the nodes of an undirected graph, and three kinds of adjacency matrices are designed to aggregate information at local, neighborhood and global levels before calculating the attention weights. A multi-element attention network (MEAN) is implemented, which includes a CNN for feature extraction, an encoder with MEA mechanism and a decoder for predicting text codes. Orientational positional encoding is added to feature maps output by the CNN, and a feature vector sequence transformed from the feature maps is used as the input of the encoder. Experimental results show that MEAN has achieved state-of-the-art or competitive performance on seven public English scene text datasets (IIITSk, SVT, IC03, IC13, IC15, SVTP, and CUTE). Further experiments have been conducted on a selected subset of the RCTW Chinese scene text dataset, demonstrating that MEAN can handle horizontal, vertical, and irregular scene text samples.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"24 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80932868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Writer Identification Using Deep Neural Networks: Impact of Patch Size and Number of Patches","authors":"Akshay Punjabi, J. R. Prieto, E. Vidal","doi":"10.1109/ICPR48806.2021.9412575","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412575","url":null,"abstract":"Traditional approaches for the recognition or identification of the writer of a handwritten text image used to relay on heuristic knowledge about the shape and other features of the strokes of previously segmented characters. However, recent works have done significantly advances on the state of the art thanks to the use of various types of deep neural networks. In most of all of these works, text images are decomposed into patches, which are processed by the networks without any previous character or word segmentation. In this paper, we study how the way images are decomposed into patches impact recognition accuracy, using three publicly available datasets. The study also includes a simpler architecture where no patches are used at all – a single deep neural network inputs a whole text image and directly provides a writer recognition hypothesis. Results show that bigger patches generally lead to improved accuracy, achieving in one of the datasets a significant improvement over the best results reported so far.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"4 1","pages":"9764-9771"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80964816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bilel Daoud, K. Morooka, Shoko Miyauchi, R. Kurazume, W. Mnejja, L. Farhat, J. Daoud
{"title":"A Deep Learning-Based Method for Predicting Volumes of Nasopharyngeal Carcinoma for Adaptive Radiation Therapy Treatment","authors":"Bilel Daoud, K. Morooka, Shoko Miyauchi, R. Kurazume, W. Mnejja, L. Farhat, J. Daoud","doi":"10.1109/ICPR48806.2021.9412924","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412924","url":null,"abstract":"This paper presents a new system for predicting the spatial change of Nasopharyngeal carcinoma(NPC) and organ-at-risks (OARs) volumes over the course of the radiation therapy (RT) treatment for facilitating the workflow of adaptive radiotherapy. The proposed system, called “Tumor Evolution Prediction (TEP-Net)”, predicts the spatial distributions of NPC and 5 OARs, separately, in response to RT in the coming week, week n. Here, TEP-Net has (n-1)-inputs that are week 1 to week n-1 of CT axial, coronal or sagittal images acquired once the patient complete the planned RT treatment of the corresponding week. As a result, three predicted results of each target region are obtained from the three-view CT images. To determine the final prediction of NPC and 5 OARs, two integration methods, weighted fully connected layers and weighted voting methods, are introduced. From the experiments using weekly CT images of 140 NPC patients, our proposed system achieves the best performance for predicting NPC and OARs compared with conventional methods.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"9 1","pages":"3256-3263"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80984587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brais Cancela, V. Bolón-Canedo, Amparo Alonso-Betanzos
{"title":"A delayed Elastic-Net approach for performing adversarial attacks","authors":"Brais Cancela, V. Bolón-Canedo, Amparo Alonso-Betanzos","doi":"10.1109/ICPR48806.2021.9413170","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413170","url":null,"abstract":"With the rise of the so-called Adversarial Attacks, there is an increased concern on model security. In this paper we present two different contributions: novel measures of robustness (based on adversarial attacks) and a novel adversarial attack. The key idea behind these metrics is to obtain a measure that could compare different architectures, with independence of how the input is preprocessed (robustness against different input sizes and value ranges). To do so, a novel adversarial attack is presented, performing a delayed elastic-net adversarial attack (constraints are only used whenever a successful adversarial attack is obtained). Experimental results show that our approach obtains state-of-the-art adversarial samples, in terms of minimal perturbation distance. Finally, a benchmark of ImageNet pretrained models is used to conduct experiments aiming to shed some light about which model should be selected whenever security is a role factor.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"186 3 1","pages":"378-384"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81074114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}