Rhythm Vohra, Femina Senjaliya, Melissa Cote, Amanda Dash, A. Albu, Julek Chawarski, Steve Pearce, Kaan Ersahin
{"title":"Detecting Underwater Discrete Scatterers in Echograms with Deep Learning-Based Semantic Segmentation","authors":"Rhythm Vohra, Femina Senjaliya, Melissa Cote, Amanda Dash, A. Albu, Julek Chawarski, Steve Pearce, Kaan Ersahin","doi":"10.1109/CVPRW59228.2023.00043","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00043","url":null,"abstract":"This paper reports on an exploratory study of the automatic detection of discrete scatterers in the water column from underwater acoustic data with deep learning (DL) networks. Underwater acoustic surveys using moored singlebeam multi-frequency echosounders make environmental monitoring tasks possible in a non-invasive manner. Discrete scatterers, i.e., individual marine organisms, are particularly challenging to detect automatically due to their small size, sometimes overlapping tracks, and similarity with various types of noise. As our interest lies in identifying the presence and general location of discrete scatterers, we propose the use of a semantic segmentation paradigm over object detection or instance segmentation, and compare several state-of-the-art DL networks. We also study the effects of early and late fusion strategies to aggregate information contained in the multi-frequency data. Experiments on the Okisollo Channel Underwater Discrete Scatterers dataset, which also include schools of herring and juvenile salmon, air bubbles from wave and fish school activity, and significant noise bands, show that late fusion yields higher metrics, with DeepLabV3+ outperforming other networks in terms of precision and intersection over union (IoU) and Attention U-Net offering higher recall. The detection of discrete scatterers is a good example of a problem for which exact annotations cannot be reached due to various reasons; in several cases, network outputs seem visually more adequate than the annotations (which contain inherent noise). This opens up the way for utilizing actual detection results to improve the annotations iteratively.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130659493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SC-NAFSSR: Perceptual-Oriented Stereo Image Super-Resolution Using Stereo Consistency Guided NAFSSR","authors":"Zidian Qiu, Zongyao He, Zhihao Zhan, Zilin Pan, Xingyuan Xian, Zhi Jin","doi":"10.1109/CVPRW59228.2023.00147","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00147","url":null,"abstract":"Stereo image Super-Resolution (SR) has made significant progress since binocular systems are widely accepted in recent years. Most stereo SR methods focus on improving the PSNR performance, while their visual quality is over-smoothing and lack of detail. Perceptual-oriented SR methods are mainly designed for single-view images, thereby their performance decreases on stereo SR due to stereo inconsistency. We propose a perceptual-oriented stereo SR framework that considers both single-view and cross-view information, noted as SC-NAFSSR. With NAF-SSR [3] as our backbone, we combine LPIPS-based perceptual loss and VGG-based perceptual loss for perceptual training. To improve stereo consistency, we perform supervision on each Stereo Cross-Attention Module (SCAM) with stereo consistency loss [27], which calculates photometric loss, smoothness loss, and cycle loss using the cycle-attention maps and valid masks of SCAM. Furthermore, we propose training strategies to fully exploit the performance on perceptual-oriented stereo SR. Both extensive experiments and ablation studies demonstrate the effectiveness of our proposed method. In particular, SC-NAFSSR outperforms the SOTA methods on Flickr1024 dataset [30]. In the NTIRE 2023 Stereo Image Super-Resolution Challenge Track 2 Perceptual & Bicubic [26], SC-NAFSSR ranked 2nd place on the leaderboard. Our source code is available at https://github.com/FVL2020/SC-NAFSSR.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130898036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Li, Curtis Wigington, Chris Tensmeyer, Vlad I. Morariu, Handong Zhao, Varun Manjunatha, Nikolaos Barmpalios, Y. Fu
{"title":"Improving Cross-Domain Detection with Self-Supervised Learning","authors":"K. Li, Curtis Wigington, Chris Tensmeyer, Vlad I. Morariu, Handong Zhao, Varun Manjunatha, Nikolaos Barmpalios, Y. Fu","doi":"10.1109/CVPRW59228.2023.00503","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00503","url":null,"abstract":"Cross-Domain Detection (XDD) aims to train a domain-adaptive object detector using unlabeled images from a target domain and labeled images from a source domain. Existing approaches achieve this either by transferring the style of source images to that of target images, or by aligning the features of images from the two domains. In this paper, rather than proposing another method following the existing lines, we introduce a new framework complementary to existing methods. Our framework unifies some popular Self-Supervised Learning (SSL) techniques (e.g., rotation angle prediction, strong/weak data augmentation, mean teacher modeling) and adapts them to the XDD task. Our basic idea is to leverage the unsupervised nature of these SSL techniques and apply them simultaneously across domains (source and target) and models (student and teacher). These SSL techniques can thus serve as shared bridges that facilitate knowledge transfer between domains. More importantly, as these techniques are independently applied in each domain, they are complementary to existing domain alignment techniques that relies on interactions between domains (e.g., adversarial alignment). We perform extensive analyses on these SSL techniques and show that they significantly improve the performance of existing methods. In addition, we reach comparable or even better performance than the state-of-the-art methods when integrating our framework with an old well-established method.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131650167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Kamenou, J. M. D. Rincón, Paul Miller, Patricia Devlin-Hill
{"title":"A Meta-learning Approach for Domain Generalisation across Visual Modalities in Vehicle Re-identification","authors":"E. Kamenou, J. M. D. Rincón, Paul Miller, Patricia Devlin-Hill","doi":"10.1109/CVPRW59228.2023.00044","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00044","url":null,"abstract":"Recent advances in imaging technologies have enabled the usage of infrared spectrum data for computer vision tasks previously working with traditional RGB data, such as re-identification. Infrared spectrum data can provide complementary and consistent visual information in situations of low visibility such as night-time, or adverse environments. However, the main issue that prevents the training of multi-modal systems is the lack of available infrared spectrum data. To this end, it is important to create systems that can easily adapt to data of multiple modalities, at inference time. In this paper, we propose a domain generalisation approach for multi-modal vehicle re-identification based on the recent success of meta-learning training approaches, and evaluate the ability of the model to perform to unseen modality data at testing time. In our experiments we use RGB, near-infrared and thermal-infrared modalities using the RGBNT100 dataset and prove that our meta-learning training configuration can improve the generalisation ability of the trained model compared to traditional training settings.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"292 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128870935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Systematic Architectural Design of Scale Transformed Attention Condenser DNNs via Multi-Scale Class Representational Response Similarity Analysis","authors":"Andrew Hryniowski, Alexander Wong","doi":"10.1109/CVPRW59228.2023.00223","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00223","url":null,"abstract":"Self-attention mechanisms are commonly included in a convolutional neural networks to achieve an improved efficiency performance balance. However, adding self-attention mechanisms adds additional hyperparameters to tune for the application at hand. In this work we propose a novel type of DNN analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRep-Sim) which can be used to identify specific design interventions that lead to more efficient self-attention convolutional neural network architectures. Using insights grained from ClassRepSim we propose the Spatial Transformed Attention Condenser (STAC) module, a novel attention-condenser based self-attention module. We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy compared to vanilla ResNet models and up to a 0.5% increase in top-1 accuracy compared to SENet models on the ImageNet64x64 dataset, at the cost of up to 1.7% increase in FLOPs and 2x the number of parameters. In addition, we demonstrate that results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance compared to an extensive parameter search.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125426793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrik Hansen, Marianela García Lozano, Farzad Kamrani, J. Brynielsson
{"title":"Real-Time Estimation of Heart Rate in Situations Characterized by Dynamic Illumination using Remote Photoplethysmography","authors":"Patrik Hansen, Marianela García Lozano, Farzad Kamrani, J. Brynielsson","doi":"10.1109/CVPRW59228.2023.00649","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00649","url":null,"abstract":"Remote photoplethysmography (rPPG) is a technique that aims to remotely estimate the heart rate of an individual using an RGB camera. Although several studies use the rPPG methodology, it is usually applied in a laboratory in a controlled environment, where both the camera and the subject are static, and the illumination is ideal for the task. However, applying rPPG in a real-life scenario is much more demanding, since dynamic illumination issues arise. The work presented in this paper introduces a framework to estimate the heart rate of an individual in real-time using an RGB camera in a situation characterized by dynamic illumination. Such situations occur, for example, when either the camera or the subject is moving, and/or the face visibility is limited. The framework uses a face detection program to extract regions of interest on an individual’s face. These regions are combined and constitute the input to a convolutional neural network, which is trained to estimate the heart rate in real-time. The method is evaluated on three publicly available datasets, and an in-house dataset specifically collected for the purpose of this study, that includes motions and dynamic illumination. The method shows good performance on all four datasets, outperforming other methods.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126434195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weijian Deng, Hongjie Yuan, Lunhui Deng, Zeng-Rong Lu
{"title":"Reparameterized Residual Feature Network For Lightweight Image Super-Resolution","authors":"Weijian Deng, Hongjie Yuan, Lunhui Deng, Zeng-Rong Lu","doi":"10.1109/CVPRW59228.2023.00172","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00172","url":null,"abstract":"In order to solve the problem of deploying super-resolution technology on resource-limited devices, this paper explores the differences in performance and efficiency between information distillation mechanism and residual learning mechanism used in lightweight super-resolution, and proposes a lightweight super-resolution network structure based on reparameterization, named RepRFN, which can effectively reduce GPU memory consumption and improve inference speed. A multi-scale feature fusion structure is designed so that the network can learn and integrate features of various scales and high-frequency edges. We rethought the redundancy existing in the overall network framework, and removed some redundant modules without affecting the overall performance as much as possible to further reduce the complexity of the model. In addition, we introduced a loss function based on Fourier transform to transform the spatial domain of the image into the frequency domain, so that the network can supervise and learn the frequency part of the image. The experimental results show that the RepRFN designed in this paper achieves relatively low complexity while ensuring certain performance, which is conducive to the deployment of Edge devices. Code is available at https://github.com/laonafahaodange/RepRFN.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126833297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Gankhuyag, Kuk-jin Yoon, Haeng Jinman Park, Seon Son, Kyoungwon Min
{"title":"Lightweight Real-Time Image Super-Resolution Network for 4K Images","authors":"G. Gankhuyag, Kuk-jin Yoon, Haeng Jinman Park, Seon Son, Kyoungwon Min","doi":"10.1109/CVPRW59228.2023.00175","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00175","url":null,"abstract":"Single-image super-resolution technology has become a topic of extensive research in various applications, aiming to enhance the quality and resolution of degraded images obtained from low-resolution sensors. However, most existing studies on single-image super-resolution have primarily focused on developing deep learning networks operating on high-performance graphics processing units. Therefore, this study proposes a lightweight real-time image super-resolution network for 4K images. Furthermore, we applied a reparameterization method to improve the network performance without incurring additional computational costs. The experimental results demonstrate that the proposed network achieves a PSNR of 30.15 dB and an inference time of 4.75 ms on an RTX 3090Ti device, as evaluated on the NTIRE 2023 Real-Time Super-Resolution validation scale X3 dataset. The code is available at https://github.com/Ganzooo/LRSRN.git.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114081446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yawei Li, Yulun Zhang, R. Timofte, L. Gool, Lei Yu, Youwei Li, Xinpeng Li, Ting Jiang, Qi Wu, Mingyan Han, Wenjie Lin, Chen Jiang, Jinting Luo, Haoqiang Fan, Shuaicheng Liu, Yucong Wang, Minjie Cai, Mingxi Li, Yuhang Zhang, Xi Fan, Yankai Sheng, Yanyu Mao, Qixuan Cai, Xinan Dai, Magauiya Zhussip, Nikolay Kalyazin, Dmitry Vyal, X. Zou, Youliang Yan, Heaseo Chung, Jin Zhang, G. Yu, Feng Zhang, Hongbin Wang, Bohao Liao, Zhibo Du, Yulan Wu, Gege Shi, Long Peng, Yang Wang, Yang Cao, Zhengjun Zha, Zhi-Kai Huang, Yi-Chung Chen, Yuan Chiang, Hao Yang, Wei-Ting Chen, Hua-En Chang, I-Hsiang Chen, Chia-Hsuan Hsieh, Sy-Yen Kuo, Xin Liu, Qian Wang, Jiahao Pan, Hong Weichen, Yu Ge, Jia Dong, Yajun Zou, Zhuoyuan Wu, B. Han, Xiaolin Zhang, He Zhang, X. Yin, Kun Zuo, Wei Deng, Hongjie Yuan, Zeng-Rong Lu, Mingyu Ouyang, Wenzhuo Ma, Nian Liu, Hanyou Zheng, Yuan Zhang, Junxi Zhang, Zhenzhong Chen, Garas Gendy, Nabil Sabor, Jingchao Hou, Guang-liang He, Yurui Zhu, Xi Wang, Xueyang Fu, Daheng Yin, Mengyang Liu, Baijun Chen, Ao
{"title":"NTIRE 2023 Challenge on Efficient Super-Resolution: Methods and Results","authors":"Yawei Li, Yulun Zhang, R. Timofte, L. Gool, Lei Yu, Youwei Li, Xinpeng Li, Ting Jiang, Qi Wu, Mingyan Han, Wenjie Lin, Chen Jiang, Jinting Luo, Haoqiang Fan, Shuaicheng Liu, Yucong Wang, Minjie Cai, Mingxi Li, Yuhang Zhang, Xi Fan, Yankai Sheng, Yanyu Mao, Qixuan Cai, Xinan Dai, Magauiya Zhussip, Nikolay Kalyazin, Dmitry Vyal, X. Zou, Youliang Yan, Heaseo Chung, Jin Zhang, G. Yu, Feng Zhang, Hongbin Wang, Bohao Liao, Zhibo Du, Yulan Wu, Gege Shi, Long Peng, Yang Wang, Yang Cao, Zhengjun Zha, Zhi-Kai Huang, Yi-Chung Chen, Yuan Chiang, Hao Yang, Wei-Ting Chen, Hua-En Chang, I-Hsiang Chen, Chia-Hsuan Hsieh, Sy-Yen Kuo, Xin Liu, Qian Wang, Jiahao Pan, Hong Weichen, Yu Ge, Jia Dong, Yajun Zou, Zhuoyuan Wu, B. Han, Xiaolin Zhang, He Zhang, X. Yin, Kun Zuo, Wei Deng, Hongjie Yuan, Zeng-Rong Lu, Mingyu Ouyang, Wenzhuo Ma, Nian Liu, Hanyou Zheng, Yuan Zhang, Junxi Zhang, Zhenzhong Chen, Garas Gendy, Nabil Sabor, Jingchao Hou, Guang-liang He, Yurui Zhu, Xi Wang, Xueyang Fu, Daheng Yin, Mengyang Liu, Baijun Chen, Ao","doi":"10.1109/CVPRW59228.2023.00189","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00189","url":null,"abstract":"This paper reviews the NTIRE 2023 challenge on efficient single-image super-resolution with a focus on the proposed solutions and results. The aim of this challenge is to devise a network that reduces one or several aspects such as runtime, parameters, FLOPs, activations, memory footprint, and depth of RFDN while at least maintaining the PSNR of 29.00dB on DIV2K validation datasets. The challenge had 272 registered participants, and 35 teams made valid submissions. They gauge the state-of-the-art for efficient single-image super-resolution.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114140790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-scale Local Implicit Keypoint Descriptor for Keypoint Matching","authors":"Jongmin Lee, Eunhyeok Park, S. Yoo","doi":"10.1109/CVPRW59228.2023.00654","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00654","url":null,"abstract":"We investigate the potential of multi-scale descriptors which has been under-explored in the existing literature. At the pixel level, we propose utilizing both coarse and fine-grained descriptors and present a scale-aware method of negative sampling, which trains descriptors at different scales in a complementary manner, thereby improving their discriminative power. For sub-pixel level descriptors, we also propose adopting coordinate-based implicit modeling and learning the non-linearity of local descriptors on continuous-domain coordinates. Our experiments show that the proposed method achieves state-of-the-art performance on various tasks, i.e., image matching, relative pose estimation, and visual localization.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"256 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121177770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}