J. Mifdal, Marc Tomás-Cruz, A. Sebastianelli, B. Coll, Joan Duran
{"title":"Deep unfolding for hyper sharpening using a high-frequency injection module","authors":"J. Mifdal, Marc Tomás-Cruz, A. Sebastianelli, B. Coll, Joan Duran","doi":"10.1109/CVPRW59228.2023.00204","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00204","url":null,"abstract":"The fusion of multi-source data with different spatial and spectral resolutions is a crucial task in many remote sensing and computer vision applications. Model-based fusion methods are more interpretable and. flexible than pure data-driven networks, but their performance depends greatly on the established fusion model and. the hand-crafted, prior. In this work, we propose an end-to-end trainable model-based. network for hyperspectral and panchromatic image fusion. We introduce an energy functional that takes into account classical observation models and. incorporates a high-frequency injection constraint. The resulting optimization function is solved by a forward-backward splitting algorithm and. unfolded into a deep-learning framework that uses two modules trained, in parallel to ensure both data observation fitting and constraint compliance. Extensive experiments are conducted, on the remote-sensing hyperspectral PRISMA dataset and on the CAVE dataset, proving the superiority of the proposed deep unfolding network qualitatively and quantitatively.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115583048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contrastive Learning for Depth Prediction","authors":"Rizhao Fan, Matteo Poggi, S. Mattoccia","doi":"10.1109/CVPRW59228.2023.00325","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00325","url":null,"abstract":"Depth prediction is at the core of several computer vision applications, such as autonomous driving and robotics. It is often formulated as a regression task in which depth values are estimated through network layers. Unfortunately, the distribution of values on depth maps is seldom explored. Therefore, this paper proposes a novel framework combining contrastive learning and depth prediction, allowing us to pay more attention to depth distribution and consequently enabling improvements to the overall estimation process. Purposely, we propose a window-based contrastive learning module, which partitions the feature maps into non-overlapping windows and constructs contrastive loss within each one. Forming and sorting positive and negative pairs, then enlarging the gap between the two in the representation space, constraints depth distribution to fit the feature of the depth map. Experiments on KITTI and NYU datasets demonstrate the effectiveness of our framework.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115606033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanbo Zhou, Yuyang Xue, Wei Deng, Ruofeng Nie, Jiajun Zhang, Jiaqi Pu, Qinquan Gao, Junlin Lan, T. Tong
{"title":"Stereo Cross Global Learnable Attention Module for Stereo Image Super-Resolution","authors":"Yuanbo Zhou, Yuyang Xue, Wei Deng, Ruofeng Nie, Jiajun Zhang, Jiaqi Pu, Qinquan Gao, Junlin Lan, T. Tong","doi":"10.1109/CVPRW59228.2023.00146","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00146","url":null,"abstract":"Stereo super-resolution is a technique that utilizes corresponding information from multiple viewpoints to enhance the texture of low-resolution images. In recent years, numerous impressive works have advocated attention mechanisms based on epipolar constraints to boost the performance of stereo super-resolution. However, techniques that exclusively depend on epipolar constraint attention are insufficient to recover realistic and natural textures for heavily corrupted low-resolution images. We noticed that global self-similarity features within the image and across the views can proficiently fix the texture details of low-resolution images that are severely damaged. Therefore, in the current paper, we propose a stereo cross global learnable attention module (SCGLAM), aiming to improve the performance of stereo super-resolution. The experimental outcomes show that our approach outperforms others when dealing with heavily damaged low-resolution images. The relevant code is made available on this link as open source.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116908662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PanopticRoad: Integrated Panoptic Road Segmentation Under Adversarial Conditions","authors":"Hidetomo Sakaino","doi":"10.1109/CVPRW59228.2023.00367","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00367","url":null,"abstract":"Segmentation becomes one of the most important methods for scene understanding. Segmentation plays a central role in recognizing things and stuff in a scene. Among all things and stuff in a scene, the road guides vehicles in the cities and highways. Most segmentation models, i.e., semantic, instance, and panoptic segmentation, have focused on images with clear daytime weather conditions. Few papers have tackled nighttime vision under adversarial conditions, i.e., fog, rain, snow, strong illumination, and disaster events. Moreover, further segmentation of road conditions like dry, wet, and snow is still challenging under such invisible conditions. Weather impacts not only visibility but also roads and their surrounding environment, causing vital disasters with obstacles on the road, i.e., rocks and water. This paper proposes PanopticRoad with five Deep Learning-based modules for road condition segmentation under adversarial conditions: DeepReject/Scene/Snow/Depth/Road. Integration of them helps refine the failure of local road conditions where weather and physical constraints are applied. Using foggy and heavy snowfall nighttime road images and disaster images, the superiority of PanopticRoad is demonstrated over state-of-the-art panoptic-based and adaptive domain-based Deep Learning models in terms of stability, robustness, and accuracy.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117331068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulating Task-Free Continual Learning Streams From Existing Datasets","authors":"A. Chrysakis, Marie-Francine Moens","doi":"10.1109/CVPRW59228.2023.00250","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00250","url":null,"abstract":"Task-free continual learning is the subfield of machine learning that focuses on learning online from a stream whose distribution changes continuously over time. In contrast, previous works evaluate task-free continual learning using streams with distributions that change not continuously, but only at a few distinct points in time. In order to address the discrepancy between the definition and evaluation of task-free continual learning, we propose a principled algorithm that can permute any labeled dataset into a stream that is continuously nonstationary. We empirically show that the streams generated by our algorithm are less structured than the ones conventionally used in the literature. Moreover, we use our simulated task-free streams to benchmark multiple methods applicable to the task-free setting. We hope that our work will allow other researchers to better evaluate learning performance on continuously nonstationary streams.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116314355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaomeng Zhu, Talha Bilal, Pär Mårtensson, Lars Hanson, Mårten Björkman, A. Maki
{"title":"Towards Sim-to-Real Industrial Parts Classification with Synthetic Dataset","authors":"Xiaomeng Zhu, Talha Bilal, Pär Mårtensson, Lars Hanson, Mårten Björkman, A. Maki","doi":"10.1109/CVPRW59228.2023.00468","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00468","url":null,"abstract":"This paper is about effectively utilizing synthetic data for training deep neural networks for industrial parts classification, in particular, by taking into account the domain gap against real-world images. To this end, we introduce a synthetic dataset that may serve as a preliminary testbed for the Sim-to-Real challenge; it contains 17 objects of six industrial use cases, including isolated and assembled parts. A few subsets of objects exhibit large similarities in shape and albedo for reflecting challenging cases of industrial parts. All the sample images come with and without random backgrounds and post-processing for evaluating the importance of domain randomization. We call it Synthetic Industrial Parts dataset (SIP-17). We study the usefulness of SIP-17 through benchmarking the performance of five state-of-the-art deep network models, supervised and self-supervised, trained only on the synthetic data while testing them on real data. By analyzing the results, we deduce some insights on the feasibility and challenges of using synthetic data for industrial parts classification and for further developing larger-scale synthetic datasets. Our dataset † and code ‡ are publicly available.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123305975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Sayed, Reza Ghoddoosian, Bhaskar Trivedi, V. Athitsos
{"title":"A New Dataset and Approach for Timestamp Supervised Action Segmentation Using Human Object Interaction","authors":"S. Sayed, Reza Ghoddoosian, Bhaskar Trivedi, V. Athitsos","doi":"10.1109/CVPRW59228.2023.00315","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00315","url":null,"abstract":"This paper focuses on leveraging Human Object Interaction (HOI) information to improve temporal action segmentation under timestamp supervision, where only one frame is annotated for each action segment. This information is obtained from an off-the-shelf pre-trained HOI detector, that requires no additional HOI-related annotations in our experimental datasets. Our approach generates pseudo labels by expanding the annotated timestamps into intervals and allows the system to exploit the spatio-temporal continuity of human interaction with an object to segment the video. We also propose the (3+1)Real-time Cooking (ReC)1 dataset as a realistic collection of videos from 30 participants cooking 15 breakfast items. Our dataset has three main properties: 1) to our knowledge, the first to offer synchronized third and first person videos, 2) it incorporates diverse actions and tasks, and 3) it consists of high resolution frames to detect fine-grained information. In our experiments we benchmark state-of-the-art segmentation methods under different levels of supervision on our dataset. We also quantitatively show the advantages of using HOI information, as our framework improves its baseline segmentation method on several challenging datasets with varying viewpoints, providing improvements of up to 10.9% and 5.3% in F1 score and frame-wise accuracy respectively.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"113 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123427572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Tabib, Nitishkumar Upasi, Tejas Anvekar, Dikshit Hegde, U. Mudenagudi
{"title":"IPD-Net: SO(3) Invariant Primitive Decompositional Network for 3D Point Clouds","authors":"R. Tabib, Nitishkumar Upasi, Tejas Anvekar, Dikshit Hegde, U. Mudenagudi","doi":"10.1109/CVPRW59228.2023.00274","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00274","url":null,"abstract":"In this paper, we propose IPD-Net: Invariant Primitive Decompositional Network, a SO(3) invariant framework for decomposition of a point cloud. The human cognitive system is able to identify and interpret familiar objects regardless of their orientation and abstraction. Recent research aims to bring this capability to machines for understanding the 3D world. In this work, we present a framework inspired by human cognition to decompose point clouds into four primitive 3D shapes (plane, cylinder, cone, and sphere) and enable machines to understand the objects irrespective of its orientations. We employ Implicit Invariant Features (IIF) to learn local geometric relations by implicitly representing the point cloud with enhanced geometric information invariant towards SO(3) rotations. We also use Spatial Rectification Unit (SRU) to extract invariant global signatures. We demonstrate the results of our proposed methodology for SO(3) invariant decomposition on TraceParts Dataset, and show the generalizability of proposed IPD-Net as plugin for downstream task on classification of point clouds. We compare the results of classification with state-of-the-art methods on benchmark dataset (ModelNet40).","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121978822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised Bidirectional Style Transfer Network using Local Feature Transform Module","authors":"K. Bae, Hyungil Kim, Y. Kwon, Jinyoung Moon","doi":"10.1109/CVPRW59228.2023.00081","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00081","url":null,"abstract":"In this paper, we propose a bidirectional style transfer method by exchanging the style of inputs while preserving the structural information. The proposed bidirectional style transfer network consists of three modules: 1) content and style extraction module that extracts the structure and style-related features, 2) local feature transform module that aligns locally extracted feature to its original coordinate, and 3) reconstruction module that generates a newly stylized image. Given two input images, we extract content and style information from both images in a global and local manner, respectively. Note that the content extraction module removes style-related information by compressing the dimension of the feature tensor to a single channel. The style extraction module removes content information by gradually reducing the spatial size of a feature tensor. The local feature transform module exchanges the style information and spatially transforms the local features to its original location. By substituting the style information with one another in both ways (i.e., global and local) bidirectionally, the reconstruction module generates a newly stylized image without diminishing the core structure. Furthermore, we enable the proposed network to control the degree of style to be applied when exchanging the style of inputs bidirectionally. Through the experiments, we compare the bidirectionally style transferred results with existing methods quantitatively and qualitatively. We show generation results by controlling the degree of applied style and adopting various textures to an identical structure.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117086625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Multi-exposure Image Fusion via Filter-dominated Fusion and Gradient-driven Unsupervised Learning","authors":"Kaiwen Zheng, Jie Huang, Huikang Yu, Fengmei Zhao","doi":"10.1109/CVPRW59228.2023.00281","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00281","url":null,"abstract":"Multi exposure image fusion (MEF) aims to produce images with a high dynamic range of visual perception by integrating complementary information from different exposure levels, bypassing common sensors’ physical limits. Despite the marvelous progress made by deep learning-based methods, few considerations have been given to the innovation of fusion paradigms, leading to insufficient model capacity utilization. This paper proposes a novel filter prediction-dominated fusion paradigm toward a simple yet effective MEF. Precisely, we predict a series of spatial-adaptive filters conditioned on the hierarchically represented features to perform an image-level dynamic fusion. The proposed paradigm has the following merits over the previous: 1) it circumvents the risk of information loss arising from the implicit encoding and decoding processes within the neural network, and 2) it better integrates local information to obtain better continuous spatial representations than the weight map-based paradigm. Furthermore, we propose a Gradient-driven Image Fidelity (GIF) loss for unsupervised MEF. Empowered by the exploitation of informative property in the gradient domain, GIF is able to implement a stable distortion-free optimization process. Experimental results demonstrate that our method achieves the best visual performance compared to the state-of-the-art while achieving an almost 30% improvement in inference time. The code is available at https://github.com/keviner1/FFMEF.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117110708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}