Hiroki Tomosada, Takahiro Kudo, Takanori Fujisawa, M. Ikehara
{"title":"GAN-Based Image Deblurring Using DCT Discriminator","authors":"Hiroki Tomosada, Takahiro Kudo, Takanori Fujisawa, M. Ikehara","doi":"10.1109/ICPR48806.2021.9412584","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412584","url":null,"abstract":"In this paper, we propose high quality image debluring by using discrete cosine transform (DCT) with less computational complexity. Recently, Convolutional Neural Network (CNN) and Generative Adversarial Network (GAN) based algorithms have been proposed for image deblurring. Moreover, multi-scale architecture of CNN restores blurred image cleary and suppresses more ringing artifacts or block noise, but it takes much time to process. To solve these problems, we propose a method that preserves texture and suppresses ringing artifacts in the restored image without multi-scale architecture using DCT based loss named “DeblurDCTGAN.”. It compares frequency domain of the images made from deblurred image and ground truth image by using DCT. Hereby, DeblurDCTGAN can reduce block noise or ringing artifacts while maintaining deblurring performance. Our experimental results show that DeblurDCTGAN gets the highest performances on both PSNR and SSIM comparing with other conventional methods in GoPro, DVD, NFS and HIDE test Dataset. Also, the running time per pair of DeblurDCTGAN is faster than others.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"10 1","pages":"3675-3681"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88938621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Static and Dynamic Brain Networks by Kulback-Leibler Divergence from fMRI Data","authors":"Gonul Gunal Degirmendereli, F. Yarman-Vural","doi":"10.1109/ICPR48806.2021.9413047","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413047","url":null,"abstract":"Representing brain activities by networks is very crucial to understand various cognitive states. This study proposes a novel method to estimate static and dynamic brain networks using Kulback-Leibler divergence. The suggested brain networks are based on the probability distributions of voxel intensity values measured by functional Magnetic Resonance Images (fMRI) recorded while the subjects perform a predefined cognitive task, called complex problem solving. We investigate the validity of the estimated brain networks by modeling and analyzing the different phases of complex problem solving process of human brain, namely planning and execution phases. The suggested computational network model is tested by a classification schema using Support Vector Machines. We observe that the network models can successfully discriminate the planning and execution phases of complex problem solving process with more than 90% accuracy, when the estimated dynamic networks, extracted from the fMRI data, are classified by Support Vector Machines.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"1 1","pages":"5913-5919"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83768066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaobin Li, Lianlei Shan, Minglong Li, Weiqiang Wang
{"title":"Energy Minimum Regularization in Continual Learning","authors":"Xiaobin Li, Lianlei Shan, Minglong Li, Weiqiang Wang","doi":"10.1109/ICPR48806.2021.9412744","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412744","url":null,"abstract":"How to give agents the ability of continuous learning like human and animals is still a challenge. In the regularized continual learning method OWM, the constraint of the model on the energy compression of the learned task is ignored, which results in the poor performance of the method on the dataset with a large number of learning tasks. In this paper, we propose an energy minimization regularization(EMR) method to constrain the energy of learned tasks, providing enough learning space for the following tasks that are not learned, and increasing the capacity of the model to the number of learning tasks. A large number of experiments show that our method can effectively increase the capacity of the model and reduce the sensitivity of the model to the number of tasks and the size of the network.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"129 1","pages":"6404-6409"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79587855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuai Liao, E. Gavves, Changyong Oh, Cees G. M. Snoek
{"title":"Quasibinary Classifier for Images with Zero and Multiple Labels","authors":"Shuai Liao, E. Gavves, Changyong Oh, Cees G. M. Snoek","doi":"10.1109/ICPR48806.2021.9412933","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412933","url":null,"abstract":"The softmax and binary classifier are commonly preferred for image classification applications. However, as softmax is specifically designed for categorical classification, it assumes each image has just one class label. This limits its applicability for problems where the number of labels does not equal one, most notably zero- and multi-label problems. In these challenging settings, binary classifiers are, in theory, better suited. However, as they ignore the correlation between classes, they are not as accurate and scalable in practice. In this paper, we start from the observation that the only difference between binary and softmax classifiers is their normalization function. Specifically, while the binary classifier self-normalizes its score, the softmax classifier combines the scores from all classes before normalisation. On the basis of this observation we introduce a normalization function that is learnable, constant, and shared between classes and data points. By doing so, we arrive at a new type of binary classifier that we coin quasibinary classifier. We show in a variety of image classification settings, and on several datasets, that quasibinary classifiers are considerably better in classification settings where regular binary and softmax classifiers suffer, including zero-label and multi-label classification. What is more, we show that quasibinary classifiers yield well-calibrated probabilities allowing for direct and reliable comparisons, not only between classes but also between data points.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"100 1","pages":"8743-8750"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83350686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Empirical Solver for Localized Multiple Kernel Learning via DNNs","authors":"Ziming Zhang","doi":"10.1109/ICPR48806.2021.9411974","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9411974","url":null,"abstract":"In this paper we propose solving localized multiple kernel learning (LMKL) using LMKL-Net, a feedforward deep neural network (DNN). In contrast to previous works, as a learning principle we propose parameterizing the gating function for learning kernel combination weights and the multiclass classifier using an attentional network (AN) and a multilayer perceptron (MLP), respectively. Such interpretability helps us better understand how the network solves the problem. Thanks to stochastic gradient descent (SGD), our approach has linear computational complexity in training. Empirically on benchmark datasets we demonstrate that with comparable or better accuracy than the state-of-the-art, our LMKL-Net can be trained about two orders of magnitude faster with about two orders of magnitude smaller memory footprint for large-scale learning.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"71 3 1","pages":"647-654"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83428738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information Graphic Summarization using a Collection of Multimodal Deep Neural Networks","authors":"Edward J. Kim, Connor Onweller, Kathleen F. McCoy","doi":"10.1109/ICPR48806.2021.9412146","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412146","url":null,"abstract":"We present a multimodal deep learning framework that can generate summarization text supporting the main idea of an information graphic for presentation to a person who is blind or visually impaired. The framework utilizes the visual, textual, positional, and size characteristics extracted from the image to create the summary. Different and complimentary neural architectures are optimized for each task using crowdsourced training data. From our quantitative experiments and results, we explain the reasoning behind our framework and show the effectiveness of our models. Our qualitative results showcase text generated from our framework and show that Mechanical Turk participants favor them to other automatic and human generated summarizations. We describe the design and results of an experiment to evaluate the utility of our system for people who have visual impairments in the context of understanding Twitter Tweets containing line graphs.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"11 1","pages":"10188-10195"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87384891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2D Discrete Mirror Transform for Image Non-Linear Approximation","authors":"Alessandro Gnutti, Fabrizio Guerrini, R. Leonardi","doi":"10.1109/ICPR48806.2021.9412019","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412019","url":null,"abstract":"In this paper, a new 2D transform named Discrete Mirror Transform (DMT) is presented. The DMT is computed by decomposing a signal into its even and odd parts around an optimal location in a given direction so that the signal energy is maximally split between the two components. After minimizing the information required to regenerate the original signal by removing redundant structures, the process is iterated leading the signal energy to distribute into a continuously smaller set of coefficients. The DMT can be displayed as a binary tree, where each node represents the single (even or odd) signal derived from the decomposition in the previous level. An optimized version of the DMT (ODMT) is also introduced, by exploiting the possibility to choose different directions at which performing the decomposition. Experimental simulations have been carried out in order to test the sparsity properties of the DMT and ODMT when applied on images: referring to both transforms, the results show a superior performance with respect to the popular Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in terms of non-linear approximation.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"23 1","pages":"9311-9317"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87298266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-modal Contextual Graph Neural Network for Text Visual Question Answering","authors":"Yaoyuan Liang, Xin Wang, Xuguang Duan, Wenwu Zhu","doi":"10.1109/ICPR48806.2021.9412891","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412891","url":null,"abstract":"Text visual question answering (TextVQA) targets at answering the question related to texts appearing in the given images, posing more challenges than VQA by requiring a deeper recognition and understanding of various shapes of human-readable scene texts as well as their meanings in different contexts. Existing works on TextVQA suffer from two weaknesses: i) scene texts and non-textual objects are processed separately and independently without considering their mutual interactions during the question understanding and answering process, ii) scene texts are encoded only through word embeddings without taking the corresponding visual appearance features as well as their potential relationships with other non-textual objects in the images into account. To overcome the weakness of existing works, we propose a novel multi-modal contextual graph neural network (MCG) model for TextVQA. The proposed MCG model can capture the relationships between visual features of scene texts and non-textual objects in the given images as well as utilize richer sources of multi-modal features to improve the model performance. In particular, we encode the scene texts into richer features containing textual, visual and positional features, then model the visual relations between scene texts and non-textual objects through a contextual graph neural network. Our extensive experiments on real-world dataset demonstrate the advantages of the proposed MCG model over baseline approaches.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"79 1","pages":"3491-3498"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84718296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaybrata Chakraborty, Bappaditya Chakraborty, U. Bhattacharya
{"title":"DenseRecognition of Spoken Languages","authors":"Jaybrata Chakraborty, Bappaditya Chakraborty, U. Bhattacharya","doi":"10.1109/ICPR48806.2021.9412413","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412413","url":null,"abstract":"In the present study, we have considered a large number (27) of Indian languages for recognition from their speech signals of different sources. A dense convolutional network architecture (DenseNet) has been used for this classification task. Dynamic elimination of low energy frames from the input speech signal has been considered as a preprocessing operation. Mel-spectrogram of pre-processed speech signal is fed as input to the DenseNet architecture. Language recognition performance of this architecture has been compared with that of several state-of-the-art deep architectures which include a convolutional neural network (CNN), ResNet, CNN-BLSTM and DenseNet-BLSTM hybrid architectures. Additionally, we obtained recognition performances of a stacked BLSTM architecture fed with different sets of handcrafted features for comparison purpose. Simulations for both speaker independent and speaker dependent scenarios have been performed on two different standard datasets which include (i) IITKGP-MLILSC dataset of news clips in 27 different Indian languages and (ii) Linguistic Data Consortium (LDC) dataset of telephonic conversations in 5 different Indian languages. In each case, recognition performance of the DenseNet architecture along with Mel-spectrogram features has been found to be significantly better than all other frameworks implemented in this study.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"78 1","pages":"9674-9681"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85264668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Point Cloud Registration Based on Cascaded Mutual Information Attention Network","authors":"Xiang Pan, Xiaoyi Ji, Sisi Cheng","doi":"10.1109/ICPR48806.2021.9413083","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413083","url":null,"abstract":"For 3D point cloud registration, how to improve the local feature correlation of two point clouds is a challenging problem. In this paper, we propose a cascaded mutual information attention registration network. The network improves the accuracy of point cloud registration by stacking residual structure and using lateral connection. Firstly, the local reference coordinate system is defined by spherical representation for the local point set, which improves the stability and reliability of local features under noise. Secondly, the attention structure is used to improve the network depth and ensure the convergence of the network. Furthermore, a lateral connection is introduced into the network to avoid the loss of features in the process of concatenation. In the experimental part, the results of different algorithms are compared. It can be found that the proposed cascaded network can enhance the correlation of local features between different point clouds. As a result, it improves the registration accuracy significantly over the DCP and other typical algorithms.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"85 1","pages":"10644-10649"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85266240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}