2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)最新文献_第2页

Tracked-Vehicle Retrieval by Natural Language Descriptions with Multi-Contextual Adaptive Knowledge 基于多上下文自适应知识的自然语言描述跟踪车辆检索

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00583

H. Le, Quang Qui-Vinh Nguyen, Duc Trung Luu, Truc Thi-Thanh Chau, Nhat Minh Chung, Synh Viet-Uyen Ha

{"title":"Tracked-Vehicle Retrieval by Natural Language Descriptions with Multi-Contextual Adaptive Knowledge","authors":"H. Le, Quang Qui-Vinh Nguyen, Duc Trung Luu, Truc Thi-Thanh Chau, Nhat Minh Chung, Synh Viet-Uyen Ha","doi":"10.1109/CVPRW59228.2023.00583","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00583","url":null,"abstract":"This paper introduces our solution for Track 2 in AI City Challenge 2023. The task is tracked-vehicle retrieval by natural language descriptions with a real-world dataset of various scenarios and cameras. Our solution mainly focuses on four points: (1) To address the linguistic ambiguity in the language query, we leverage our proposed standardized version for text descriptions for the domain-adaptive training and post-processing stage. (2) Our baseline vehicle retrieval model utilizes CLIP to extract robust visual and textual feature representations to learn the unified cross-modal representations between textual and visual features. (3) Our proposed semi-supervised domain adaptive (SSDA) training method is leveraged to address the domain gap between the train and test set. (4) Finally, we propose a multi-contextual post-processing technique that prunes out the wrong results based on multi-contextual attributes information that effectively boosts the final retrieval results. Our proposed framework has yielded a competitive performance of 82.63% MRR accuracy on the test set, achieving 1st place in the competition. Codes will be available at https://github.com/zef1611/AIC23_NLRetrieval_HCMIU_CVIP","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"704 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116121086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Mixer-based Local Residual Network for Lightweight Image Super-resolution 基于混合器的轻量图像超分辨率局部残差网络

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00161

Garas Gendy, Nabil Sabor, Jingchao Hou, Guang-liang He

{"title":"Mixer-based Local Residual Network for Lightweight Image Super-resolution","authors":"Garas Gendy, Nabil Sabor, Jingchao Hou, Guang-liang He","doi":"10.1109/CVPRW59228.2023.00161","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00161","url":null,"abstract":"Recently, the single image super-resolution (SISR) based on deep learning algorithm has taken more attention from the research community. There are many methods that are developed to solve this task using CNNs methods. However, most of these methods need large computational resources and consume more runtime. Due to the fact that the runtime is essential for some applications, we propose a mixer-based local residual network (MLRN) for lightweight image super-resolution (SR). The idea of the MLRN model is based on mixing channel and spatial features and mixing low and high-frequency information. This is done by designing a mixer local residual block (MLRB) to be the backbone of our model. Moreover, the bilinear up-sampling is utilized to transfer and mix low-frequency information with extracted high-frequency information. Finally, the GELU activation is used in the main model, proving its efficiency for the SR task. The experimental results show the effectiveness of the model against other state-of-the-art lightweight models. Finally, we took part in the Efficient Super-Resolution 2023 Challenge and achieved good results.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116410381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Exploring the Effectiveness of Lightweight Architectures for Face Anti-Spoofing 探索轻量级架构在人脸防欺骗中的有效性

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00680

Yoanna Martínez-Díaz, Heydi Mendez Vazquez, Luis S. Luevano, M. González-Mendoza

引用次数: 0

CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate Speech in Text-Embedded Images from Russia-Ukraine Conflict CrisisHateMM:俄乌冲突文本嵌入图像中定向和非定向仇恨言论的多模态分析

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00193

Aashish Bhandari, S. Shah, Surendrabikram Thapa, Usman Naseem, Mehwish Nasim

{"title":"CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate Speech in Text-Embedded Images from Russia-Ukraine Conflict","authors":"Aashish Bhandari, S. Shah, Surendrabikram Thapa, Usman Naseem, Mehwish Nasim","doi":"10.1109/CVPRW59228.2023.00193","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00193","url":null,"abstract":"Text-embedded images are frequently used on social media to convey opinions and emotions, but they can also be a medium for disseminating hate speech, propaganda, and extremist ideologies. During the Russia-Ukraine war, both sides used text-embedded images extensively to spread propaganda and hate speech. To aid in moderating such content, this paper introduces CrisisHateMM, a novel multimodal dataset of over 4,700 text-embedded images from the Russia-Ukraine conflict, annotated for hate and non-hate speech. The hate speech is annotated for directed and undirected hate speech, with directed hate speech further annotated for individual, community, and organizational targets. We benchmark the dataset using unimodal and multimodal algorithms, providing insights into the effectiveness of different approaches for detecting hate speech in text-embedded images. Our results show that multimodal approaches outperform unimodal approaches in detecting hate speech, highlighting the importance of combining visual and textual features. This work provides a valuable resource for researchers and practitioners in automated content moderation and social media analysis. The CrisisHateMM dataset and codes are made publicly available at https://github.com/aabhandari/CrisisHateMM.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114494229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Reliable Student: Addressing Noise in Semi-Supervised 3D Object Detection 半监督3D物体检测中的噪声寻址

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00526

Farzad Nozarian, Shashank Agarwal, Farzaneh Rezaeianaran, Danish Shahzad, Atanas Poibrenski, Christian Müller, P. Slusallek

{"title":"Reliable Student: Addressing Noise in Semi-Supervised 3D Object Detection","authors":"Farzad Nozarian, Shashank Agarwal, Farzaneh Rezaeianaran, Danish Shahzad, Atanas Poibrenski, Christian Müller, P. Slusallek","doi":"10.1109/CVPRW59228.2023.00526","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00526","url":null,"abstract":"Semi-supervised 3D object detection can benefit from the promising pseudo-labeling technique when labeled data is limited. However, recent approaches have overlooked the impact of noisy pseudo-labels during training, despite efforts to enhance pseudo-label quality through confidence-based filtering. In this paper, we examine the impact of noisy pseudo-labels on IoU-based target assignment and propose the Reliable Student framework, which incorporates two complementary approaches to mitigate errors. First, it involves a class-aware target assignment strategy that reduces false negative assignments in difficult classes. Second, it includes a reliability weighting strategy that suppresses false positive assignment errors while also addressing remaining false negatives from the first step. The reliability weights are determined by querying the teacher network for confidence scores of the student-generated proposals. Our work surpasses the previous state-of-the-art on KITTI 3D object detection benchmark on point clouds in the semi-supervised setting. On 1% labeled data, our approach achieves a 6.2% AP improvement for the pedestrian class, despite having only 37 labeled samples available. The improvements become significant for the 2% setting, achieving 6.0% AP and 5.7% AP improvements for the pedestrian and cyclist classes, respectively. Our code will be released at https://github.com/fnozarian/ReliableStudent","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116753674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A2-Aug: Adaptive Automated Data Augmentation a2 - 8:自适应自动数据增强

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00221

Lujun Li, Zheng Hua Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie Zhou, Qingyi Gu

{"title":"A2-Aug: Adaptive Automated Data Augmentation","authors":"Lujun Li, Zheng Hua Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie Zhou, Qingyi Gu","doi":"10.1109/CVPRW59228.2023.00221","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00221","url":null,"abstract":"Data augmentation is a promising way to enhance the generalization ability of deep learning models. Many proxy-free and proxy-based automated augmentation methods are proposed to search for the best augmentation for target datasets. However, the proxy-free methods require lots of searching overhead, while the proxy-based methods introduce optimization gaps with the actual task. In this paper, we explore a new proxy-free approach that only needs a small number of searches (~ 5 vs 100 of RandAugment) to alleviate these issues. Specifically, we propose Adaptive Automated Augmentation (A2 -Aug), a simple and effective proxy-free framework, which seeks to mine the adaptive ensemble knowledge of multiple augmentations to further improve the adaptability of each candidate augmentation. Firstly, A2 -Aug automatically learns the ensemble logit from multiple candidate augmentations, which is jointly optimized and adaptive to target tasks. Secondly, the adaptive ensemble logit is used to distill each logit of input augmentation via KL divergence. In this way, these a few candidate augmentations can implicitly learn strong adaptability for the target datasets, which enjoy similar effects with many searches of RandAugment. Finally, equipped with joint training via separate BatchNorm and normalized distillation, A2-Aug obtains state-of-the-art performance with less training budget. In experiments, our A2 -Aug achieves 4% performance gain on CIFAR-100, which substantially outperforms other methods. On ImageNet, we obtain a top-1 accuracy of 79.2% for ResNet-50, a 1.6% boosting over the AutoAugment with at least 25× faster training speed.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130315118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow 动态迭代场变换用于存储高效光流

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00216

Risheek Garrepalli, Jisoo Jeong, R. C. Ravindran, J. Lin, F. Porikli

引用次数: 1

Multi-camera People Tracking With Mixture of Realistic and Synthetic Knowledge 混合现实与合成知识的多相机人物跟踪

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00581

Quang Qui-Vinh Nguyen, H. Le, Truc Thi-Thanh Chau, Duc-Tuan Luu, Nhat Minh Chung, Synh Viet-Uyen Ha

{"title":"Multi-camera People Tracking With Mixture of Realistic and Synthetic Knowledge","authors":"Quang Qui-Vinh Nguyen, H. Le, Truc Thi-Thanh Chau, Duc-Tuan Luu, Nhat Minh Chung, Synh Viet-Uyen Ha","doi":"10.1109/CVPRW59228.2023.00581","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00581","url":null,"abstract":"This paper presents a solution for Track 1 of the AI City Challenge 2023, which involves Multi-Camera People Tracking in indoor scenarios. The proposed framework comprises four modules: Vehicle detection, ReID feature extraction, single-camera multi-target tracking (SCMT), single-camera matching, and multi-camera matching. A significant contribution of our approach is the introduction of ID switch detection and ID switch splitting using the Gaussian mixture model, which efficiently addresses the problem of tracklets with ID switches. Furthermore, our system performs well in matching both synthetic and real data. The proposed R-matching algorithm performs exceptionally well in real scenarios despite being trained on synthetic data. Experimental results on the public test set of 2023 AI City Challenge Track 1 demonstrate the efficacy of the proposed approach, achieving an IDF1 of 94.17% and securing 2nd position on the leaderboard. Codes will be available at https://github.com/nguyenquivinhquang/Multi-camera-People-Tracking-With-Mixture-of-Realistic-and-Synthetic-Knowledge","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121369883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Adversarial Domain Generalization for Surveillance Face Anti-Spoofing 监视人脸抗欺骗的对抗域泛化

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00676

Yongluo Liu, Yaowen Xu, Zhaofan Zou, Zhuming Wang, Bowen Zhang, Lifang Wu, Zhizhi Guo, Zhixiang He

{"title":"Adversarial Domain Generalization for Surveillance Face Anti-Spoofing","authors":"Yongluo Liu, Yaowen Xu, Zhaofan Zou, Zhuming Wang, Bowen Zhang, Lifang Wu, Zhizhi Guo, Zhixiang He","doi":"10.1109/CVPRW59228.2023.00676","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00676","url":null,"abstract":"In traditional scenes (short-distance applications), the current Face Anti-Spoofing (FAS) methods have achieved satisfactory performance. However, in surveillance scenes (long-distance applications), those methods cannot be generalized well due to the deviation in image quality. Some methods attempt to recover lost details from low-quality images through image reconstruction, but unknown image degradation results in suboptimal performance. In this paper, we regard image quality degradation as a domain generalization problem. Specifically, we propose an end-to-end Adversarial Domain Generalization Network (ADGN) to improve the generalization of FAS. We first divide the accessible training data into multiple sub-source domains based on image quality scores. Then, a feature extractor and a domain discriminator are trained to make the extracted features from different sub-source domains undistinguishable (i.e., quality-invariant features), thus forming an adversarial learning procedure. At the same time, we have introduced the transfer learning strategy to address the problem of insufficient training data. Our method won second place in \"Track Surveillance Face Anti-spoofing\" of the 4th Face Anti-spoofing Challenge@CVPR2023. Our final submission obtains 9.21% APCER, 1.90% BPCER, and 5.56% ACER, respectively.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121423059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PRB-FPN+: Video Analytics for Enforcing Motorcycle Helmet Laws PRB-FPN+:执行摩托车头盔法的视频分析

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00579

Bo Wang, Ping-Yang Chen, Yi-Kuan Hsieh, J. Hsieh, Ming-Ching Chang, JiaXin He, Shin-You Teng, HaoYuan Yue, Yu-Chee Tseng

{"title":"PRB-FPN+: Video Analytics for Enforcing Motorcycle Helmet Laws","authors":"Bo Wang, Ping-Yang Chen, Yi-Kuan Hsieh, J. Hsieh, Ming-Ching Chang, JiaXin He, Shin-You Teng, HaoYuan Yue, Yu-Chee Tseng","doi":"10.1109/CVPRW59228.2023.00579","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00579","url":null,"abstract":"We present a video analytic system for enforcing motorcycle helmet regulation as a participation to the AI City Challenge 2023 [18] Track 5 contest. The advert of powerful object detectors enables real-time localization of the road users and even the ability to determine if a motorcyclist or a rider is wearing a helmet. Ensuring road safety is important, as the helmets can effectively provide protection against severe injuries and fatalities. However, monitoring and enforcing helmet compliance is challenging, given the large number of motorcyclists and limited visual input such as occlusions. To address these challenges, we propose a novel two-step approach. First, we introduce the PRB-FPN+, a state-of-the-art detector that excels in object localization. We also explore the benefits of deep supervision by incorporating auxiliary heads within the network, leading to enhanced performance of our deep learning architectures. Second, we utilize an advanced tracker named SMILEtrack to associate and refine the target track-lets. Comprehensive experimental results demonstrate that the PRB-FPN+ outperforms the state-of-the-art detectors on MS-COCO. Our system achieved a remarkable rank of 8 on the AI City Challenge 2023 [18] Track 5 Public Leader-board. Code implementation is available at: https://github.com/NYCU-AICVLab/AICITY_2023_Track5.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114304771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1