{"title":"CNN-Based Fast CU Partitioning Algorithm for VVC Intra Coding","authors":"Jun Xu, Guoqing Wu, Chen Zhu, Yan Huang, Li Song","doi":"10.1109/ICIP46576.2022.9897378","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897378","url":null,"abstract":"Over a year has passed since the finalization of Versatile Video Coding (H.266/VVC), yet it is still far from practical deployment, a major reason being the excessive complexity. The flexible and sophisticated quad-tree with nested multi-type tree partitioning structure in VVC provides considerable performance gains while bringing about an exponential increase in encoding time. To reduce the coding complexity, this paper proposes a Convolutional Neural Network (CNN) based fast Coding Unit (CU) partitioning algorithm for intra coding, which accelerates CU partition through predicting the partition modes with texture information and terminating redundant modes in advance. Corresponding classifiers are designed for different CU sizes to improve prediction accuracy. Low rate-distortion performance degradation is guaranteed by introducing performance loss due to misclassification into the loss function. Experiments show that the proposed method can save encoding time ranging from 38.39% to 62.33% with 0.92% to 2.36% bit rate increase.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129118409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seungju Cho, Junyoung Byun, Myung-Joon Kwon, Yoon-Ji Kim, Changick Kim
{"title":"Adversarial Training with Channel Attention Regularization","authors":"Seungju Cho, Junyoung Byun, Myung-Joon Kwon, Yoon-Ji Kim, Changick Kim","doi":"10.1109/ICIP46576.2022.9897754","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897754","url":null,"abstract":"Adversarial attack shows that deep neural networks (DNNs) are highly vulnerable to small perturbation. Currently, one of the most effective ways to defend against adversarial attacks is adversarial training, which generates adversarial examples during training and induces the models to classify them correctly. To further increase robustness, various techniques such as exploiting additional unlabeled data and novel training loss have been proposed. In this paper, we propose a novel regularization method that exploits latent features, which can be easily combined with existing approaches. We discover that particular channels are more sensitive to adversarial perturbation, motivating us to propose regularizing these channels. Specifically, we attach a channel attention module for adjusting sensitivity of each channel by reducing the difference between the latent feature of the natural image and that of the adversarial image, which we call Channel Attention Regularization (CAR). CAR can be combined with the existing adversarial training framework, showing that it improves the robustness of state-of-the-art defense models. Experiments on various existing adversarial training methods against diverse attacks show the effectiveness of our methods. Codes are available at https://github.com/sgmath12/Adversarial-Training-CAR.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"33 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120873324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI-Based Compression: A New Unintended Counter Attack on JPEG-Related Image Forensic Detectors?","authors":"Alexandre Berthet, J. Dugelay","doi":"10.1109/ICIP46576.2022.9897697","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897697","url":null,"abstract":"The detection of forged images is an important topic in digital image forensics. There are two main types of forgery: copy-move and splicing. These forgeries are created with image editors that apply JPEG compression by default, when saving the forged images. As a result, the authentic and falsified areas have different compression statistics, including histograms of DCT coefficients that show inconsistencies in the case of double JPEG compression. Therefore, the detection of double JPEG compression (DJPEG-C) is an important topic for JPEG-related image forensic detectors. Since the emergence of deep learning in image processing, AI-based compression methods have been proposed. This paper is the first to consider AI-based compression with digital image analysis tools. The objective is to understand whether AI-based compression can be a new unintended counter-attack for JPEG-related image forensic detectors. To verify our hypothesis, we selected the best detector to date, an AI-based compression method and the Casia v2 database that contains both splicing and copy-move (all publicly available). We focused our experiment on benign post-processing operations: AI-based and JPEG recompressions (with different quality levels). The evaluation is performed using different metrics (average precision, F1 score and accuracy, PSNR, SSIM) to take into account both the impact on detection and image quality. At similar image quality, AI-based recompression achieves a decrease in performance at least twice higher than JPEG, while preserving high visual image quality. Thus, AI-based compression is a new unintended counter-attack, which can no longer be ignored in future studies on image forensic detectors.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121288141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuoxuan Li, Junli Yang, Bin Wang, Yaqi Li, Ting Pan
{"title":"Maskformer with Improved Encoder-Decoder Module for Semantic Segmentation of Fine-Resolution Remote Sensing Images","authors":"Zhuoxuan Li, Junli Yang, Bin Wang, Yaqi Li, Ting Pan","doi":"10.1109/ICIP46576.2022.9897888","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897888","url":null,"abstract":"In 2021, the Transformer based models have demonstrated extraordinary achievement in the field of computer vision. Among which, Maskformer, a Transformer based model adopting the mask classification method, is an outstanding model in both semantic segmentation and instance segmentation. Considering the specific characteristics of semantic segmentation of remote sensing images (RSIs), we design CADA-MaskFormer(a Mask classification-based model with Cross-shaped window self-Attention and Densely connected feature Aggregation) based on Maskformer by improving its encoder and pixel decoder. Concretely, the mask classification that generates one or even more masks for specific category to perform the elaborate segmentation is especially suitable for handling the characteristic of large within-class and small between-class variance of RSIs. Furthermore, we apply the Cross-Shaped Window self-attention mechanism to model the long-range context information contained in RSIs at maximum extent without the increasing of computational complexity. In addition, the Densely Connected Feature Aggregation Module (DCFAM) is used as the pixel decoder to incorporate multi-level feature maps from the encoder to get a finer semantic segmentation map. Extensive experiments conducted on two remotely sensed semantic segmentation datasets Potsdam and Vaihingen achieves 91.88% and 91.01% in OA index respectively, outperforming most of competitive models designed for RSIs. The code is available from https://github.com/lqwrl542293/JL-Yang_CV/tree/master/CADA_Maskformer","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"569 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116253632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingsheng Deng, Md Rakibul Hasan, Minhaz Mahmud, Ma Farsi Hasan, K. A. Ahmed, Md. Zakir Hossain
{"title":"Diagnosing Autism Spectrum Disorder Using Ensemble 3D-CNN: A Preliminary Study","authors":"Jingsheng Deng, Md Rakibul Hasan, Minhaz Mahmud, Ma Farsi Hasan, K. A. Ahmed, Md. Zakir Hossain","doi":"10.1109/ICIP46576.2022.9897628","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897628","url":null,"abstract":"Autism spectrum disorder (ASD) is a neuro-developmental disorder that results in behavioural retardation in verbal communications and social interactions. Traditional ASD detection methods involve assessing patients’ behavioural patterns by medical practitioners, which often lack credibility and precision. The contribution of the current study involves a 3D-CNN (convolutional neural network) model to diagnose ASD patients from healthy individuals using functional magnetic resonance imaging (fMRI) of the brain. We utilised a publicly available dataset, Autism Brain Imaging Data Exchange (ABIDE I), and tested different CNN-based models in individual and combined brain parcellations. Our model showed a better outcome (74.53% accuracy, 69.98% sensitivity, and 76.00% specificity) for combined parcellations than individuals. Further, we compared our model with several state-of-the-art models and discussed the suitability of our model for future prospects. The current model would be a predecessor of future prognosis models or behavioural patterns-based multi-modal models for early detection of ASD.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116346399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-label Aerial Image Classification Based on Image-Specific Concept Graphs","authors":"Dan Lin, Zhikui Chen, Liang Zhao, Kai Wang","doi":"10.1109/ICIP46576.2022.9897476","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897476","url":null,"abstract":"Multi-label aerial image classification (MAIC) is a fundamental but challenging task for computer vision-based remote sensing applications. Existing MAIC models suffer from the insufficient semantic information of image and label representations. To this end, we integrate commonsense knowledge into the MAIC task and propose a novel Knowledge-augmented Concept Graph Learning (KCGL) framework. KCGL first collects relevant semantic concepts for each label from a commonsense knowledge graph ConceptNet. With the guidance of semantic concepts, an image decoupling module is employed to extract concept-specific image features from the input image. Then, KCGL constructs an individual concept graph for each image, in which nodes are corresponding to concept-specific image features and edges are their relations extracted from ConceptNet. Finally, the classification probability on each label is computed in the specific concept graph via a GCN-based encoder-decoder model. Experimental results prove that the proposed KCGL outperforms existing state-of-the-art MAIC models on two aerial image datasets.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127702338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3DCT Reconstruction from a Single X-Ray Projection Using Convolutional Neural Network","authors":"Estelle Loÿen, D. Dasnoy-Sumell, B. Macq","doi":"10.1109/ICIP46576.2022.9897902","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897902","url":null,"abstract":"The treatment of mobile tumors remains complex in radiotherapy due to breathing-related organ movements. A solution to ensure target coverage during the process is to plan the treatment taking into account safety margins. One way to significantly reduce these safety margins would be to adapt the treatment in real time using image-guided radiation therapy. The acquisition of x-ray projections during treatment is commonly used to localise the tumor in 2D but doesn’t provide 3D information. Hence, the aim of this work is to reconstruct a high resolution 3D image based on a single radiograph in order to know the 3D position of the tumor and the organs. This is done using a convolutional neural network. The results show that the proposed method is able to reconstruct a 3DCT based on a 2D projection x-ray only. The normalized root mean square error is computed between the ground truth 3DCT and the predicted 3DCT, and the mean of this metric is between 0.02713 and 0.02776 depending on the patient.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127720528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Pascotti Valem, Vinicius Atsushi Sato Kawai, Vanessa Helena Pereira-Ferrero, D. C. G. Pedronette
{"title":"A Novel Rank Correlation Measure for Manifold Learning on Image Retrieval and Person Re-ID","authors":"Lucas Pascotti Valem, Vinicius Atsushi Sato Kawai, Vanessa Helena Pereira-Ferrero, D. C. G. Pedronette","doi":"10.1109/ICIP46576.2022.9898060","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9898060","url":null,"abstract":"Effectively measuring similarity among data samples represented as points in high-dimensional spaces remains a major challenge in retrieval, machine learning, and computer vision. In these scenarios, unsupervised manifold learning techniques grounded on rank information have been demonstrated to be a promising solution. However, various methods rely on rank correlation measures, which often depend on a proper definition of neighborhood size. On current approaches, this definition may lead to a reduction in the final desired effectiveness. In this work, a novel rank correlation measure robust to such variations is proposed for manifold learning approaches. The proposed measure is suitable for diverse scenarios and is validated on a Manifold Learning Algorithm based on Correlation Graph (CG). The experimental evaluation considered 6 datasets on general image retrieval and person Re-ID, achieving results superior to most state-of-the-art methods.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126428688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-View Feature Boosting Network for Deep Subspace Clustering","authors":"Jinjoo Song, Gangjoon Yoon, Sangwon Baek, S. Yoon","doi":"10.1109/ICIP46576.2022.9897575","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897575","url":null,"abstract":"Subspace clustering is widely used to find clusters in different subspaces within a dataset. Autoencoders are popular deep subspace clustering methods using feature extraction and dimensional reduction. However, neural networks are vulnerable to overfitting, and therefore have limited potential for unsupervised subspace clustering. This paper proposes a deep multi-view subspace clustering network with feature boosting module to successfully extract meaningful features in different views and to fuse multi-view representations in a complementary manner for enhanced clustering results. The multi-view boosting provides the robust features for unsupervised clustering by emphasizing the features and removing the redundant noise. Quantitative and qualitative analysis on various benchmark datasets verifies that the proposed method outperforms state-of-the-art subspace clustering methods.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125609326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}