2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)最新文献

XDNet: A Few-Shot Meta-Learning Approach for Cross-Domain Visual Inspection XDNet:跨领域视觉检查的几次元学习方法

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00460

Xian Yeow Lee, L. Vidyaratne, M. Alam, Ahmed K. Farahat, Dipanjan Ghosh, Teresa Gonzalez Diaz, Chetan Gupta

{"title":"XDNet: A Few-Shot Meta-Learning Approach for Cross-Domain Visual Inspection","authors":"Xian Yeow Lee, L. Vidyaratne, M. Alam, Ahmed K. Farahat, Dipanjan Ghosh, Teresa Gonzalez Diaz, Chetan Gupta","doi":"10.1109/CVPRW59228.2023.00460","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00460","url":null,"abstract":"Automated visual inspection has the potential to improve the efficiency and accuracy of inspection tasks across various industries. Deep learning models have been at the forefront of many automated visual inspection technologies. In this work, we focus on a specific instance of a visual inspection problem: the defect detection and classification problem. Training a deep learning model from scratch to detect defects is challenging due to the scarcity of labeled images with defects. Moreover, it is progressively more challenging to adapt a deep learning model across different domains using limited labeled data. We propose a cross-domain meta-learning framework, XDNet, to solve the defect classification problem using a few labeled samples. XDNet is inspired by recent advancements in pre-trained backbone models as general feature extractors and meta-learning frameworks, which adapt across different domains using non-parametric classifiers under limited computational resources. We demonstrate the efficacy of XDNet using a benchmark anomaly detection dataset which we re-formulate as a defect detection and classification problem. Experimental results suggest that XDNet performs significantly better (≈ 17%) than the existing state-of-the-art and baseline models. Additionally, we perform an ablation study to identify the important components that contribute to the improved performance of the proposed framework. Finally, we conduct a data domain-specific analysis to understand the potential strengths and drawbacks of XDNet on different types of defects.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115661977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Category Differences Matter: A Broad Analysis of Inter-Category Error in Semantic Segmentation 范畴差异很重要:语义切分中范畴间错误的广义分析

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00401

Jingxing Zhou, Jürgen Beyerer

引用次数: 0

OTST: A Two-Phase Framework for Joint Denoising and Remosaicing in RGBW CFA OTST: RGBW CFA联合去噪和再填充的两阶段框架

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00284

Zhihao Fan, Xun Wu, Fanqing Meng, Yaqi Wu, Feng Zhang

{"title":"OTST: A Two-Phase Framework for Joint Denoising and Remosaicing in RGBW CFA","authors":"Zhihao Fan, Xun Wu, Fanqing Meng, Yaqi Wu, Feng Zhang","doi":"10.1109/CVPRW59228.2023.00284","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00284","url":null,"abstract":"RGBW, a newly emerged type of Color Filter Array (CFA), possesses strong low-light photography capabilities. RGBW CFA shows significant application value when low-light sensitivity is critical, such as in security cameras and smartphones. However, the majority of commercial image signal processors (ISP) are primarily designed for Bayer CFA, research pertaining to RGBW CFA is very rare. To address above limitations, in this study, we propose a two-phase framework named OTST for the RGBW Joint Denoising and Remosaicing (RGBW-JRD) task. For the denoising stage, we propose Omni-dimensional Dynamic Convolution based Half-Shuffle Transformer (ODC-HST) which can fully utilize image’s long-range dependencies to dynamically remove the noise. For the remosaicing stage, we propose a Spatial Compressive Transformer (SCT) to efficiently capture both local and global dependencies across spatial and channel dimensions. Experimental results demonstrate that our two-phase RGBW-JRD framework outperforms existing RGBW denoising and remosaicing solutions across a wide range of noise levels. In addition, the proposed approach ranks the 2nd place in MIPI 2023 RGBW Joint Remosaic and Denoise competition.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117174589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Defending Low-Bandwidth Talking Head Videoconferencing Systems From Real-Time Puppeteering Attacks 保护低带宽说话头视频会议系统免受实时操纵攻击

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00105

Danial Samadi Vahdati, T. D. Nguyen, M. Stamm

{"title":"Defending Low-Bandwidth Talking Head Videoconferencing Systems From Real-Time Puppeteering Attacks","authors":"Danial Samadi Vahdati, T. D. Nguyen, M. Stamm","doi":"10.1109/CVPRW59228.2023.00105","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00105","url":null,"abstract":"Talking head videos have gained significant attention in recent years due to advances in AI that allow for the synthesis of realistic videos from only a single image of the speaker. Recently, researchers have proposed low bandwidth talking head video systems for use in applications such as videoconferencing and video calls. However, these systems are vulnerable to puppeteering attacks, where an attacker can control a synthetic version of a different target speaker in real-time. This can be potentially used spread misinformation or committing fraud. Because the receiver always creates a synthetic video of the speaker, deepfake detectors cannot protect against these attacks. As a result, there are currently no defenses against puppeteering in these systems. In this paper, we propose a new defense against puppeteering attacks in low-bandwidth talking head video systems by utilizing the biometric information inherent in the facial expression and pose data transmitted to the receiver. Our proposed system requires no modifications to the video transmission system and operates with low computational cost. We present experimental evidence to demonstrate the effectiveness of our proposed defense and provide a new dataset for benchmarking defenses against puppeteering attacks.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121123125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Event-based Blur Kernel Estimation For Blind Motion Deblurring 基于事件的运动去模糊模糊核估计

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00433

Takuya Nakabayashi, Kunihiro Hasegawa, M. Matsugu, H. Saito

{"title":"Event-based Blur Kernel Estimation For Blind Motion Deblurring","authors":"Takuya Nakabayashi, Kunihiro Hasegawa, M. Matsugu, H. Saito","doi":"10.1109/CVPRW59228.2023.00433","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00433","url":null,"abstract":"Motion blur can significantly reduce the quality of images, and researchers have developed various algorithms to address this issue. One common approach to deblurring is to use deconvolution to cancel out the blur effect, but this method is limited by the difficulty of accurately estimating blur kernels from blurred images. This is because the motion causing the blur is often complex and nonlinear. In this paper, a new method for estimating blur kernels is proposed. This method uses an event camera, which captures high-temporal-resolution data on pixel luminance changes, along with a conventional camera to capture the input blurred image. By analyzing the event data stream, the proposed method estimates the 2D motion of the blurred image at short intervals during the exposure time, and integrates this information to estimate a variety of complex blur motions. With the estimated blur kernel, the input blurred image can be deblurred using deconvolution. The proposed method does not rely on machine learning and therefore can restore blurry images without depending on the quality and quantity of training data. Experimental results show that the proposed method can estimate blur kernels even for images blurred by complex camera motions, outperforming conventional methods. Overall, this paper presents a promising approach to motion deblurring that could have practical applications in a range of fields.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127454056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring Video Frame Redundancies for Efficient Data Sampling and Annotation in Instance Segmentation 探索视频帧冗余在实例分割中有效的数据采样和注释

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00333

Jihun Yoon, Min-Kook Choi

{"title":"Exploring Video Frame Redundancies for Efficient Data Sampling and Annotation in Instance Segmentation","authors":"Jihun Yoon, Min-Kook Choi","doi":"10.1109/CVPRW59228.2023.00333","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00333","url":null,"abstract":"In recent years, deep neural network architectures and learning algorithms have greatly improved the performance of computer vision tasks. However, acquiring and annotating large-scale datasets for training such models can be expensive. In this work, we explore the potential of reducing dataset sizes by leveraging redundancies in video frames, specifically for instance segmentation. To accomplish this, we investigate two sampling strategies for extracting keyframes, uniform frame sampling with adjusted stride (UFS) and adaptive frame sampling (AFS), which employs visual (Optical flow, SSIM) or semantic (feature representations) dissimilarities measured by learning free methods. In addition, we show that a simple copy-paste augmentation can bridge the big mAP gap caused by frame reduction. We train and evaluate Mask R-CNN with the BDD100K MOTS dataset and verify the potential of reducing training data by extracting keyframes in the video. With only 20% of the data, we achieve similar performance to the full dataset mAP; with only 33% of the data, we surpass it. Lastly, based on our findings, we offer practical solutions for developing effective sampling methods and data annotation strategies for instance segmentation models. Supplementary on https://github.com/jihun-yoon/EVFR.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124864432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SkiLL: Skipping Color and Label Landscape: Self Supervised Design Representations for Products in E-commerce 技能:跳过颜色和标签景观:电子商务中产品的自我监督设计表示

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00354

V. Verma, D. Sanny, S. Kulkarni, Prateek Sircar, Abhishek Singh, D. Gupta

{"title":"SkiLL: Skipping Color and Label Landscape: Self Supervised Design Representations for Products in E-commerce","authors":"V. Verma, D. Sanny, S. Kulkarni, Prateek Sircar, Abhishek Singh, D. Gupta","doi":"10.1109/CVPRW59228.2023.00354","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00354","url":null,"abstract":"Understanding the design of a product without human supervision is a crucial task for e-commerce services. Such a capability can help in multiple downstream e-commerce tasks like product recommendations, design trend analysis, image-based search, and visual information retrieval, etc. For this task, getting fine-grain label data is costly and not scalable for the e-commerce product. In this paper, we leverage knowledge distillation based self-supervised learning (SSL) approach to learn design representations. These representations do not require human annotation for training and focus on only design related attributes of a product and ignore attributes like color, orientation, etc. We propose a global and task specific local augmentation space which captures the desired image information and provides robust visual embedding. We evaluated our model for the three highly diverse datasets, and also propose and measure a quantitative metric to evaluate the model’s color invariant feature learning ability. In all scenarios, our proposed approach outperforms the recent SSL model by upto 8.6% in terms of accuracy.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126019497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BlazeStyleGAN: A Real-Time On-Device StyleGAN BlazeStyleGAN:一个实时的设备造型器

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00495

Haolin Jia, Qifei Wang, Omer Tov, Yang Zhao, Fei Deng, Lu Wang, Chuo-Ling Chang, Tingbo Hou, Matthias Grundmann

引用次数: 1

Scoring Your Prediction on Unseen Data 根据看不见的数据为你的预测打分

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00330

Yuhao Chen, Shen Zhang, Renjie Song

{"title":"Scoring Your Prediction on Unseen Data","authors":"Yuhao Chen, Shen Zhang, Renjie Song","doi":"10.1109/CVPRW59228.2023.00330","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00330","url":null,"abstract":"The performance of deep neural networks can vary substantially when evaluated on datasets different from the training data. This presents a crucial challenge in evaluating models on unseen data without access to labels. Previous methods compute a single model-based indicator at the dataset level and use regression methods to predict performance. To evaluate the model more accurately, we propose a sample-level label-free model evaluation method for better prediction on unseen data, named Scoring Your Prediction (SYP). Specifically, SYP introduces low-level image-based features (e.g., blurriness) to model image quality that is important for classification. We complementarily combine model-based indicators and image-based indicators to enhance sample representation. Additionally, we predict the probability that each sample is correctly classified using a neural network named oracle model. Compared to other existing methods, the proposed method outperforms them on 40 unlabeled datasets transformed by CIFAR-10. Especially, SYP lowers RMSE by 1.83-3.97 for ResNet-56 evaluation and 2.32-9.74 for RepVGG-A0 evaluation compared with latest methods. Note that our scheme won the championship on the DataCV Challenge at CVPR 2023. Source code is avaliabe at https://github.com/megvii-research/SYP.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126847428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fashion-Specific Ambiguous Expression Interpretation with Partial Visual-Semantic Embedding 基于部分视觉语义嵌入的时尚特定歧义表达解释

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00353

Ryotaro Shimizu, Takuma Nakamura, M. Goto

引用次数: 0