{"title":"Gaussian Distributed Graph Constrained Multi-Modal Gaussian Process Latent Variable Model for Ordinal Labeled Data","authors":"Keisuke Maeda, Takahiro Ogawa, M. Haseyama","doi":"10.1109/ICIP46576.2022.9898070","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9898070","url":null,"abstract":"This paper proposes a Gaussian distributed graph constrained multi-modal Gaussian process latent variable model for ordinal labeled data. Rating data that are used in various real-world applications such as product recommendation can represent user preferences, but the difference between adjacent ratings is often uncertain due to the user’s ambiguity. In order to capture the relationships among multi-modal data including rating data, consideration of the uncertainty is necessary. Therefore, by applying the Gaussian distribution to the rating data, we calculate distributed labels that implicitly include the uncertainty, and thus, the Gaussian distributed graph based on their similarities can be constructed. By introducing a constraint calculated based on the graph Laplacian of the Gaussian distributed graph into the objective function of the multi-modal Gaussian process latent variable model, we can achieve an effective latent space that can consider a label correlation while accounting for the uncertainty. This is the contribution of this paper. The effectiveness of the proposed method is verified by experiments using some open datasets.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123763250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning from Noisy Labels via Meta Credible Label Elicitation","authors":"Ziyang Gao, Yaping Yan, Xin Geng","doi":"10.1109/ICIP46576.2022.9897577","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897577","url":null,"abstract":"Deep neural networks (DNNS) can easily overfit to noisy data, which leads to a significant degradation of performance. Previous efforts are primarily made by label correction or sample selection to alleviate supervision problem. To distinguish between noisy labels and clean labels, we propose a meta-learning framework which could gradually elicit credible labels via the meta-gradient descent step under the guidance of potentially non-noisy samples. Specifically, by exploiting the topological information of feature space, we can automatically estimate label confidence with a meta-learner. An iterative procedure is designed to select the most trustworthy noisy-labeled instances to generate pseudo labels. Then we train DNNs with pseudo supervision and original noisy super vision, which learns sufficiency and robustness properties in a joint learning objective. Experimental results on benchmark classification datasets show the superiority of our approach against the state-of-the-art methods.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126644191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video-Grounded Dialogues with Joint Video and Image Training","authors":"Han Zhang, Yingming Li, Zhongfei Zhang","doi":"10.1109/ICIP46576.2022.9897613","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897613","url":null,"abstract":"In this paper, we propose a multi-modal transformer model for end-to-end training of video-grounded dialogue generation. In particular, LayerScale regularized spatio-temporal self-attention blocks are first introduced to enable us to flexibly train end-to-end from both video and image data, without extracting offline visual features. Further, a pre-trained generative language architecture BART is employed to encode different modalities and perform dialogue generation. Extensive experiments on Audio-Visual Scene-Aware Dialog (AVSD) dataset demonstrate its effectiveness and superiority to the state-of-the-art methods.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"1108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116058823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Supervised Cooperative Colorization of Achromatic Faces","authors":"Hitika Tiwari, K. Venkatesh","doi":"10.1109/ICIP46576.2022.9897765","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897765","url":null,"abstract":"Despite the recent progress in deep learning-based face image colorization techniques, there is still much room for improvement. One of the significant challenges is the bias toward specific skin color. Moreover, the conventional face colorization approaches aim to produce colored 2D face images, whereas the generation of colored 3D faces from monocular achromatic (gray-scale) images is beyond the scope of these methods despite having immense potential applications. To address these issues, we propose Self-Supervised COoperative COlorizaTion of Achromatic Faces (COCOTA) framework that contains chromatic and achromatic pipelines to jointly estimate the color and shape of 3D faces using monocular achromatic face images without inducing any specific color bias. On the challenging CelebA test dataset, COCOTA out-performs the current state-of-the-art method by a large margin (e.g., for 3D color-based error, a reduction from 5.12 ± 0.13 to 3.09 ± 0.08 leading to an improvement of 39.6%), demonstrating the effectiveness of the proposed method.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"515 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116209644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laetitia Launet, Adrián Colomer, Andrés Mosquera-Zamudio, Anais Moscardó, C. Monteagudo, V. Naranjo
{"title":"A Self-Training Weakly-Supervised Framework for Pathologist-Like Histopathological Image Analysis","authors":"Laetitia Launet, Adrián Colomer, Andrés Mosquera-Zamudio, Anais Moscardó, C. Monteagudo, V. Naranjo","doi":"10.1109/ICIP46576.2022.9897274","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897274","url":null,"abstract":"The advent of artificial intelligence-based tools applied to digital pathology brings the promise of reduced workload for pathologists and enhanced patient care, not to mention medical research progress. Yet, despite its great potential, the field is hindered by the paucity of annotated histological data, a limitation for developing robust deep learning models. To reduce the number of expert annotations needed for training, we introduce a novel framework combining self-training and weakly-supervised learning that uses both annotated and unannotated data samples. Inspired by how pathologists examine biopsies, our method considers whole slide images from a bird’s eye view to roughly localize the tumor area before focusing on its features at a higher magnification level. Notwithstanding the scarcity of the dataset, the experimental results show that the proposed method outperforms models trained with annotated data only and previous works analyzing the same type of lesions, thus demonstrating the efficiency of the approach.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122296749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Residual Interpolation for Spike Camera Demosaicing","authors":"Yanchen Dong, Jing Zhao, Ruiqin Xiong, Tiejun Huang","doi":"10.1109/ICIP46576.2022.9897590","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897590","url":null,"abstract":"The recently invented spike camera can capture high-speed motion in dynamic scenes by accumulating incoming photons continuously and firing spikes at very high temporal resolution. This paper addresses the demosaicing problem in spike camera color imaging. Specifically, we propose the 3D residual interpolation (3DRI) method to convert raw spike frames to color image frames. Due to the Poisson effect of photon arrivals and the quantization effect of spike readout, the instantaneous intensity recovered from the spike stream may suffer from undesired noise. To handle the noise, we estimate the missing color pixels along motion trajectories to exploit the temporal correlation among neighboring frames. In addition, by utilizing the color channels correlation, we design a residual-based demosaicing pipeline that uses the green pixels to guide the estimation of the red or blue missing pixels. Experimental results demonstrate our proposed 3DRI can produce color images from spike streams, achieving a good objective and perceptual quality for high-motion scenes.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122361060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate and Robust Image Correspondence for Structure-From-Motion and its Application to Multi-View Stereo","authors":"Shuhei Hoshi, Koichi Ito, T. Aoki","doi":"10.1109/ICIP46576.2022.9897304","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897304","url":null,"abstract":"In this paper, we propose a robust and accurate image correspondence method by combining SuperPoint + SuperGlue (SP+SG) and Local feature matching with TRansformers (LoFTR). The proposed method finds corresponding points on regions with rich texture by SP+SG and those with poor texture by LoFTR since SP+SG exhibits high localization accuracy of image correspondence and LoFTR exhibits high robustness against poor texture regions. The proposed method can be used for image correspondence in SfM to not only improve the estimation accuracy of camera parameters in SfM, but also to improve the reconstruction accuracy and expand the reconstruction area in MVS. Through experiments on the ETH3D dataset, we demonstrate that the proposed method achieves more accurate 3D reconstruction than conventional methods, and also show the impact of image correspondence accuracy in SfM on multi-view 3D reconstruction.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122408100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-Stream Non-Uniform Concentration Reasoning Network for Single Image Air Pollution Estimation","authors":"Huilin Chen, Wenming Yang, Q. Liao","doi":"10.1109/ICIP46576.2022.9897665","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897665","url":null,"abstract":"With the increasing availability of portable cameras and smart phones, directly estimating PM2.5 based on digital photography shows advantages in efficiency and economic costs. In this paper, a novel Two-stream Non-uniform Concentration Reasoning Network (TNCR-Net) is proposed for single image PM2.5 concentration estimation. Motivated by locally non-uniform particle pollution concentration distribution in images, we adopt patch-based scheme and adaptive weighted average mechanism to obtain patch-wise concentration and relative weight based on spatially varying perceptual relevance of local particle pollution concentration. Then aggregate patch-wise concentrations according to relative weights. To learn more effective feature from particular pollution image, we use a two-stream network structure with the dark channel map as the input of one stream. Besides, we employ attention-based feature fusion method to flexibly aggregate the feature maps of the two streams. Experiments on real-world dataset indicate that our TNCR-Net outperforms other state-of-the-art methods with fewer parameters.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"21 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114060598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anisotropic Edge Detection in Catadioptric Images","authors":"Enzhuang Zheng, Baojiang Zhong, K. Ma","doi":"10.1109/ICIP46576.2022.9897384","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897384","url":null,"abstract":"Catadioptric images are produced in omnidirectional vision systems and can be expressed on Riemannian manifolds. The existing edge detectors are operated either in Euclidean space, or on Riemannian manifolds with isotropic image filtering. In this paper, a new type of edge detection is proposed—it is operated on Riemannian manifolds with anisotropic image filtering. For that, an anisotropic image filtering kernel on Riemannian manifolds is derived by solving the anisotropic heat equation embedded with Riemannian metric. With this kernel, a novel anisotropic edge detector is then developed. Compared to an edge detector operated in Euclidean space, our edge detector is more suitable for catadioptric images, since their geometric structure information will be taken into account in the detection process. Compared to existing edge detectors customized for catadioptric images, the new edge detector has a higher efficiency in preserving image edges and thus can produce more true positives.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114298807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dualfeat: Dual Feature Aggregation for Video Object Detection","authors":"Jingning Pan, Kaiwen Du, Y. Yan, Hanzi Wang","doi":"10.1109/ICIP46576.2022.9897580","DOIUrl":"https://doi.org/10.1109/ICIP46576.2022.9897580","url":null,"abstract":"Video object detection aims to detect and track each object in a given video. However, due to the problem of appearance deterioration in the video, it is still challenging to obtain good results when we apply traditional image object detection methods to videos. In this paper, we propose a new feature aggregation method, called Dual Feature Aggregation (DualFeat) for video object detection. By effectively combining the temporal and spatial attention mechanisms, we make full use of the temporal and spatial information in videos. Meanwhile, we leverage a real-time tracker to track detected objects in video frames, where features are aggregated again with previously obtained features. Such a way helps to obtain more comprehensive and richer features, greatly improving the accuracy of video object detection. We perform experiments on the ILSVRC2017 dataset, and the experimental results also verify the effectiveness of our method.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114397048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}