Bo Peng, Yuying Jing, Dengchao Jin, Xiangrui Liu, Zhaoqing Pan, Jianjun Lei
{"title":"Texture-Guided End-to-End Depth Map Compression","authors":"Bo Peng, Yuying Jing, Dengchao Jin, Xiangrui Liu, Zhaoqing Pan, Jianjun Lei","doi":"10.1109/ICIP46576.2022.9897569","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897569","url":null,"abstract":"End-to-end compression methods designed for the texture image have achieved excellent coding performances. Due to the characteristic differences between the depth map and the texture image, the texture-oriented methods have limitations in depth map compression. To address this problem, this paper proposes a texture-guided end-to-end depth map compression network (TDMC-Net). Specifically, the proposed TDMC-Net is mainly composed of the texture-guided transform module (TTM) which performs the nonlinear transform with providing the textual context to reduce the redundancy in depth feature, and a texture-guided conditional entropy model (TCEM) which is designed to improve the entropy model by introducing the texture conditional prior. Experimental results show that the proposed TDMC-Net boosts the depth coding efficiency by utilizing the texture information and achieves superior performance.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114506585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Britty Baby, Mustafa Chasmai, Tamajit Banerjee, A. Suri, Subhashis Banerjee, Chetan Arora
{"title":"Representation Learning Using Rank Loss for Robust Neurosurgical Skills Evaluation","authors":"Britty Baby, Mustafa Chasmai, Tamajit Banerjee, A. Suri, Subhashis Banerjee, Chetan Arora","doi":"10.1109/ICIP46576.2022.9897932","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897932","url":null,"abstract":"Surgical simulators provide hands-on training and learning of the necessary psychomotor skills. Automated skill evaluation of the trainee doctors based on the video of a task being performed by them is an important key step for the optimal utilization of such simulators. However, current skill evaluation techniques require accurate tracking information of the instruments which restricts their applicability to robot assisted surgeries only. In this paper, we propose a novel neural network architecture that can perform skill evaluation using video data alone (and no tracking information). Given the small dataset available for training such a system, the network trained using ℓ2 regression loss easily overfits the training data. We propose a novel rank loss to help learn robust representation, leading to 5% improvement for skill score prediction on the benchmark JIGSAWS dataset. To demonstrate the applicability of our method on non-robotic surgeries, we contribute a new neuro-endoscopic technical skills (NETS) training dataset comprising of 100 short videos of 12 subjects. Our method achieved 27% improvement over the state of the art on the NETS dataset. Project page with source code, and data is available at nets-iitd.github.io/nets-v1.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114521497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Panoptic-Deeplab-DVA: Improving Panoptic Deeplab with Dual Value Attention and Instance Boundary Aware Regression","authors":"Qingfeng Liu, Mostafa El-Khamy","doi":"10.1109/ICIP46576.2022.9897430","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897430","url":null,"abstract":"Panoptic DeepLab is a state-of-the-art framework that has showed good tradeoff between performance and complexity. In this paper, we focus on improving it to increase wide deployment of panoptic segmentation on mobile devices with low complexity. Specifically, we first present a novel Dual Value Attention (DVA) module to enable context information exchange between the semantic segmentation branch and the instance segmentation branch. Second, we further propose a new instance Boundary Aware Regression (iBAR) loss that assigns more emphasis on the instance boundary during instance regression. To assess the effectiveness of our proposed approach, we evaluate the performance on MSCOCO dataset for panoptic segmentation task, to show that our approach can improve upon the state-of-the-art Panoptic DeepLab with both the light-weight backbone network MobileNetV3 and the heavy-weight backbone network HRNetV2.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117020492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziwen Lan, Keisuke Maeda, Takahiro Ogawa, M. Haseyama
{"title":"GCN-Based Multi-Modal Multi-Label Attribute Classification in Anime Illustration Using Domain-Specific Semantic Features","authors":"Ziwen Lan, Keisuke Maeda, Takahiro Ogawa, M. Haseyama","doi":"10.1109/ICIP46576.2022.9898071","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9898071","url":null,"abstract":"This paper presents a multi-modal multi-label attribute classification model in anime illustration based on Graph Convolutional Networks (GCN) using domain-specific semantic features. In animation production, since creators often intentionally highlight the subtle characteristics of the characters and objects when creating anime illustrations, we focus on the task of multi-label attribute classification. To capture the relationship between attributes, we construct a multi-modal GCN model that can adopt semantic features specific to anime illustration. To generate the domain-specific semantic features that represent the semantic contents of anime illustrations, we construct a new captioning framework for anime illustration by combining real images and their style transformation. The contributions of the proposed method are two-folds. 1) More comprehensive relationships between attributes are captured by introducing GCN with semantic features into the multi-label attribute classification task of anime illustrations. 2) More accurate image captioning of anime illustrations can be generated by a trainable model by using only real-world images. To our best knowledge, this is the first work dealing with multi-label attribute classification in anime illustration. The experimental results show the effectiveness of the proposed method by comparing it with some existing methods including the state-of-the-art methods.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129811257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Undersampled Dynamic Fourier Ptychography via Phaseless PCA","authors":"Zhengyu Chen, Seyedehsara Nayer, Namrata Vaswani","doi":"10.1109/ICIP46576.2022.9897747","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897747","url":null,"abstract":"In recent work, we studied the phaseless PCA (low rank phase retrieval) problem and developed a provably correct and fast alternating minimization (AltMin) solution for it called AltMinLowRaP. In this work, we develop a modification of AltMinLowRaP, called AltMinLowRaP-Ptych, that is designed for reducing the sample complexity (number of measurements required for accurate recovery) for dynamic Fourier ptychographic imaging. Fourier ptychography is a computational imaging technique that enables high-resolution microscopy using multiple low-resolution cameras. Via exhaustive experiments on real image sequences with simulated ptychographic measurements, we show the power of our algorithm for reducing the number of samples required for accurate recovery.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128307144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Khairallah, Fabien Bonardi, D. Roussel, S. Bouchafa
{"title":"PCA Event-Based Optical Flow: A Fast and Accurate 2D Motion Estimation","authors":"M. Khairallah, Fabien Bonardi, D. Roussel, S. Bouchafa","doi":"10.1109/ICIP46576.2022.9897875","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897875","url":null,"abstract":"For neuromorphic vision sensors such as event-based cameras, a paradigm shift is required to adapt optical flow estimation as it is critical for many applications. Regarding the costly computations, Principal Component Analysis (PCA) approach is adapted to the problem of event-based optical flow estimation. We propose different PCA regularization methods enhancing the optical flow estimation efficiently. Furthermore, we show that the variants of our proposed method, dedicated to real-time context, are about two times faster than state-of-the-art implementations while significantly improving optical flow accuracy.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128701651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient One-Shot Sports Field Image Registration with Arbitrary Keypoint Segmentation","authors":"Nicolas Jacquelin, Romain Vuillemot, S. Duffner","doi":"10.1109/ICIP46576.2022.9897170","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897170","url":null,"abstract":"Automatic sports field registration aims at projecting a given image taken with unknown camera parameters to a known 3D coordinate system in order to obtain higher-level information like the position and speed of players. Existing methods generally detect specific visual landmarks on the field and then use an iterative refinement to get closer to the desired calibration. They are usually only compared in terms of precision on a standard benchmark without considering other metrics. However, execution speed is also important, mainly in the context of live broadcast TV and sports analysis. This work introduces a new automatic field registration method achieving excellent performance on the WorldCup Soccer benchmark, while neither depending on specific visible landmarks nor any refinement, resulting in a very high execution speed one-shot model. Finally, to complement the usual Soccer benchmark, we introduce a new Swimming Pool registration benchmark which is more challenging for the task at hand. Code and dataset available at https://github.com/njacquelin/sportsfieldregistration.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128614895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Liang, Haosheng Chen, Y. Yan, Yang Lu, Hanzi Wang
{"title":"Guided Sampling Based Feature Aggregation for Video Object Detection","authors":"Jun Liang, Haosheng Chen, Y. Yan, Yang Lu, Hanzi Wang","doi":"10.1109/ICIP46576.2022.9897210","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897210","url":null,"abstract":"Video object detection is a challenging task due to the presence of appearance deterioration in video frames. Recently, feature aggregation based methods which aggregate context information from object proposals in different frames to improve the performance, have dominated the task. However, much invalid information may be introduced during feature aggregation since frames and proposals are usually selected at random. In this paper, we propose a guided sampling based feature aggregation network (GSFA) to perform more effective feature aggregation. Specifically, we introduce a frame-level sampling module and a proposal-level sampling module to sample informative frames and proposals from a video sequence adaptively. As a result, the proposed GSFA can effectively aggregate context information from the semantically rich frames and proposals to boost the performance. Experimental results on the ImageNet VID dataset show the proposed GSFA achieves the state-of-the-art performance of 84.8% mAP with ResNet-101 and 85.8% mAP with ResNeXt-101.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128618948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Afsana Ahsan Jeny, Masum Shah Junayed, Md Baharul Islam
{"title":"An Efficient End-To-End Image Compression Transformer","authors":"Afsana Ahsan Jeny, Masum Shah Junayed, Md Baharul Islam","doi":"10.1109/ICIP46576.2022.9897663","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897663","url":null,"abstract":"Image and video compression received significant research attention and expanded their applications. Existing entropy estimation-based methods combine with hyperprior and local context, limiting their efficacy. This paper introduces an efficient end-to-end transformer-based image compression model, which generates a global receptive field to tackle the long-range correlation issues. A hyper encoder-decoder-based transformer block employs a multi-head spatial reduction self-attention (MHSRSA) layer to minimize the computational cost of the self-attention layer and enable rapid learning of multi-scale and high-resolution features. A Casual Global Anticipation Module (CGAM) is designed to construct highly informative adjacent contexts utilizing channel-wise linkages and identify global reference points in the latent space for end-to-end rate-distortion optimization (RDO). Experimental results demonstrate the effectiveness and competitive performance of the KODAK dataset.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128648593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Wang, Shikun Zhang, F. Song, Ge Song, Ming Yang
{"title":"A Lightweight Network with Multi-Stage Feature Fusion Module for Single-View 3d Face Reconstruction","authors":"Jing Wang, Shikun Zhang, F. Song, Ge Song, Ming Yang","doi":"10.1109/ICIP46576.2022.9897570","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897570","url":null,"abstract":"3D face reconstruction has attracted great attentions of researchers from both academic and industry for its potential application in many scenarios such as face alignment and recognition across large poses. 3D Morphable Model which reconstructs a 3D face through basis coefficients prediction, is usually adopted as the typical parametric framework for 3D face and is suitable to combine with deep learning. Existing cascade regression method predicts coefficients by multiple iterations, which is time-consuming. In this paper, we propose an efficient and end-to-end method for single-view 3D face reconstruction. We build a lightweight network based on mobile blocks with faster speed for parameter extraction and smaller model size. Especially, a multi-stage feature fusion module is designed for enhancing the end-to-end learning. To match the setting of input image size, we updated the pose label of images under various sizes in training dataset before training. Extensive experiments on challenging datasets validate the efficiency of our method for both 3D face reconstruction and face alignment.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129487293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}