H. Le, Quang Qui-Vinh Nguyen, Duc Trung Luu, Truc Thi-Thanh Chau, Nhat Minh Chung, Synh Viet-Uyen Ha
{"title":"Tracked-Vehicle Retrieval by Natural Language Descriptions with Multi-Contextual Adaptive Knowledge","authors":"H. Le, Quang Qui-Vinh Nguyen, Duc Trung Luu, Truc Thi-Thanh Chau, Nhat Minh Chung, Synh Viet-Uyen Ha","doi":"10.1109/CVPRW59228.2023.00583","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00583","url":null,"abstract":"This paper introduces our solution for Track 2 in AI City Challenge 2023. The task is tracked-vehicle retrieval by natural language descriptions with a real-world dataset of various scenarios and cameras. Our solution mainly focuses on four points: (1) To address the linguistic ambiguity in the language query, we leverage our proposed standardized version for text descriptions for the domain-adaptive training and post-processing stage. (2) Our baseline vehicle retrieval model utilizes CLIP to extract robust visual and textual feature representations to learn the unified cross-modal representations between textual and visual features. (3) Our proposed semi-supervised domain adaptive (SSDA) training method is leveraged to address the domain gap between the train and test set. (4) Finally, we propose a multi-contextual post-processing technique that prunes out the wrong results based on multi-contextual attributes information that effectively boosts the final retrieval results. Our proposed framework has yielded a competitive performance of 82.63% MRR accuracy on the test set, achieving 1st place in the competition. Codes will be available at https://github.com/zef1611/AIC23_NLRetrieval_HCMIU_CVIP","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"704 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116121086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Garas Gendy, Nabil Sabor, Jingchao Hou, Guang-liang He
{"title":"Mixer-based Local Residual Network for Lightweight Image Super-resolution","authors":"Garas Gendy, Nabil Sabor, Jingchao Hou, Guang-liang He","doi":"10.1109/CVPRW59228.2023.00161","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00161","url":null,"abstract":"Recently, the single image super-resolution (SISR) based on deep learning algorithm has taken more attention from the research community. There are many methods that are developed to solve this task using CNNs methods. However, most of these methods need large computational resources and consume more runtime. Due to the fact that the runtime is essential for some applications, we propose a mixer-based local residual network (MLRN) for lightweight image super-resolution (SR). The idea of the MLRN model is based on mixing channel and spatial features and mixing low and high-frequency information. This is done by designing a mixer local residual block (MLRB) to be the backbone of our model. Moreover, the bilinear up-sampling is utilized to transfer and mix low-frequency information with extracted high-frequency information. Finally, the GELU activation is used in the main model, proving its efficiency for the SR task. The experimental results show the effectiveness of the model against other state-of-the-art lightweight models. Finally, we took part in the Efficient Super-Resolution 2023 Challenge and achieved good results.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116410381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yoanna Martínez-Díaz, Heydi Mendez Vazquez, Luis S. Luevano, M. González-Mendoza
{"title":"Exploring the Effectiveness of Lightweight Architectures for Face Anti-Spoofing","authors":"Yoanna Martínez-Díaz, Heydi Mendez Vazquez, Luis S. Luevano, M. González-Mendoza","doi":"10.1109/CVPRW59228.2023.00680","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00680","url":null,"abstract":"Detecting spoof faces is crucial in ensuring the robustness of face-based identity recognition and access control systems, as faces can be captured easily without the user’s cooperation in uncontrolled environments. Several deep models have been proposed for this task, achieving high levels of accuracy but at a high computational cost. Considering the very good results obtained by lightweight deep networks on different computer vision tasks, in this work we explore the effectiveness of this kind of architectures for face anti-spoofing. Specifically, we asses the performance of three lightweight face models on two challenging benchmark databases. The conducted experiments indicate that face anti-spoofing solutions based on lightweight face models are able to achieve comparable accuracy results to those obtained by state-of-the-art very deep models, with a significantly lower computational complexity.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116432676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aashish Bhandari, S. Shah, Surendrabikram Thapa, Usman Naseem, Mehwish Nasim
{"title":"CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate Speech in Text-Embedded Images from Russia-Ukraine Conflict","authors":"Aashish Bhandari, S. Shah, Surendrabikram Thapa, Usman Naseem, Mehwish Nasim","doi":"10.1109/CVPRW59228.2023.00193","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00193","url":null,"abstract":"Text-embedded images are frequently used on social media to convey opinions and emotions, but they can also be a medium for disseminating hate speech, propaganda, and extremist ideologies. During the Russia-Ukraine war, both sides used text-embedded images extensively to spread propaganda and hate speech. To aid in moderating such content, this paper introduces CrisisHateMM, a novel multimodal dataset of over 4,700 text-embedded images from the Russia-Ukraine conflict, annotated for hate and non-hate speech. The hate speech is annotated for directed and undirected hate speech, with directed hate speech further annotated for individual, community, and organizational targets. We benchmark the dataset using unimodal and multimodal algorithms, providing insights into the effectiveness of different approaches for detecting hate speech in text-embedded images. Our results show that multimodal approaches outperform unimodal approaches in detecting hate speech, highlighting the importance of combining visual and textual features. This work provides a valuable resource for researchers and practitioners in automated content moderation and social media analysis. The CrisisHateMM dataset and codes are made publicly available at https://github.com/aabhandari/CrisisHateMM.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114494229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farzad Nozarian, Shashank Agarwal, Farzaneh Rezaeianaran, Danish Shahzad, Atanas Poibrenski, Christian Müller, P. Slusallek
{"title":"Reliable Student: Addressing Noise in Semi-Supervised 3D Object Detection","authors":"Farzad Nozarian, Shashank Agarwal, Farzaneh Rezaeianaran, Danish Shahzad, Atanas Poibrenski, Christian Müller, P. Slusallek","doi":"10.1109/CVPRW59228.2023.00526","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00526","url":null,"abstract":"Semi-supervised 3D object detection can benefit from the promising pseudo-labeling technique when labeled data is limited. However, recent approaches have overlooked the impact of noisy pseudo-labels during training, despite efforts to enhance pseudo-label quality through confidence-based filtering. In this paper, we examine the impact of noisy pseudo-labels on IoU-based target assignment and propose the Reliable Student framework, which incorporates two complementary approaches to mitigate errors. First, it involves a class-aware target assignment strategy that reduces false negative assignments in difficult classes. Second, it includes a reliability weighting strategy that suppresses false positive assignment errors while also addressing remaining false negatives from the first step. The reliability weights are determined by querying the teacher network for confidence scores of the student-generated proposals. Our work surpasses the previous state-of-the-art on KITTI 3D object detection benchmark on point clouds in the semi-supervised setting. On 1% labeled data, our approach achieves a 6.2% AP improvement for the pedestrian class, despite having only 37 labeled samples available. The improvements become significant for the 2% setting, achieving 6.0% AP and 5.7% AP improvements for the pedestrian and cyclist classes, respectively. Our code will be released at https://github.com/fnozarian/ReliableStudent","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116753674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lujun Li, Zheng Hua Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie Zhou, Qingyi Gu
{"title":"A2-Aug: Adaptive Automated Data Augmentation","authors":"Lujun Li, Zheng Hua Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie Zhou, Qingyi Gu","doi":"10.1109/CVPRW59228.2023.00221","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00221","url":null,"abstract":"Data augmentation is a promising way to enhance the generalization ability of deep learning models. Many proxy-free and proxy-based automated augmentation methods are proposed to search for the best augmentation for target datasets. However, the proxy-free methods require lots of searching overhead, while the proxy-based methods introduce optimization gaps with the actual task. In this paper, we explore a new proxy-free approach that only needs a small number of searches (~ 5 vs 100 of RandAugment) to alleviate these issues. Specifically, we propose Adaptive Automated Augmentation (A2 -Aug), a simple and effective proxy-free framework, which seeks to mine the adaptive ensemble knowledge of multiple augmentations to further improve the adaptability of each candidate augmentation. Firstly, A2 -Aug automatically learns the ensemble logit from multiple candidate augmentations, which is jointly optimized and adaptive to target tasks. Secondly, the adaptive ensemble logit is used to distill each logit of input augmentation via KL divergence. In this way, these a few candidate augmentations can implicitly learn strong adaptability for the target datasets, which enjoy similar effects with many searches of RandAugment. Finally, equipped with joint training via separate BatchNorm and normalized distillation, A2-Aug obtains state-of-the-art performance with less training budget. In experiments, our A2 -Aug achieves 4% performance gain on CIFAR-100, which substantially outperforms other methods. On ImageNet, we obtain a top-1 accuracy of 79.2% for ResNet-50, a 1.6% boosting over the AutoAugment with at least 25× faster training speed.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130315118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Risheek Garrepalli, Jisoo Jeong, R. C. Ravindran, J. Lin, F. Porikli
{"title":"DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow","authors":"Risheek Garrepalli, Jisoo Jeong, R. C. Ravindran, J. Lin, F. Porikli","doi":"10.1109/CVPRW59228.2023.00216","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00216","url":null,"abstract":"Recent advancements in neural network-based optical flow estimation often come with prohibitively high computational and memory requirements, presenting challenges in their model adaptation for mobile and low-power use cases. In this paper, we introduce a lightweight low-latency and memory-efficient model, Dynamic Iterative Field Transforms (DIFT), for optical flow estimation feasible for edge applications such as mobile, XR, micro UAVs, robotics and cameras. DIFT follows an iterative refinement framework leveraging variable resolution of cost volumes for correspondence estimation. We propose a memory efficient solution for cost volume processing to reduce peak memory. Also, we present a novel dynamic coarse-to-fine cost volume processing during various stages of refinement to avoid multiple levels of cost volumes. We demonstrate first real-time cost-volume based optical flow DL architecture on Snapdragon 8 Gen 1 HTP efficient mobile AI accelerator with 32 inf/sec and 5.89 EPE (endpoint error) on KITTI with manageable accuracy-performance tradeoffs.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123883052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quang Qui-Vinh Nguyen, H. Le, Truc Thi-Thanh Chau, Duc-Tuan Luu, Nhat Minh Chung, Synh Viet-Uyen Ha
{"title":"Multi-camera People Tracking With Mixture of Realistic and Synthetic Knowledge","authors":"Quang Qui-Vinh Nguyen, H. Le, Truc Thi-Thanh Chau, Duc-Tuan Luu, Nhat Minh Chung, Synh Viet-Uyen Ha","doi":"10.1109/CVPRW59228.2023.00581","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00581","url":null,"abstract":"This paper presents a solution for Track 1 of the AI City Challenge 2023, which involves Multi-Camera People Tracking in indoor scenarios. The proposed framework comprises four modules: Vehicle detection, ReID feature extraction, single-camera multi-target tracking (SCMT), single-camera matching, and multi-camera matching. A significant contribution of our approach is the introduction of ID switch detection and ID switch splitting using the Gaussian mixture model, which efficiently addresses the problem of tracklets with ID switches. Furthermore, our system performs well in matching both synthetic and real data. The proposed R-matching algorithm performs exceptionally well in real scenarios despite being trained on synthetic data. Experimental results on the public test set of 2023 AI City Challenge Track 1 demonstrate the efficacy of the proposed approach, achieving an IDF1 of 94.17% and securing 2nd position on the leaderboard. Codes will be available at https://github.com/nguyenquivinhquang/Multi-camera-People-Tracking-With-Mixture-of-Realistic-and-Synthetic-Knowledge","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121369883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial Domain Generalization for Surveillance Face Anti-Spoofing","authors":"Yongluo Liu, Yaowen Xu, Zhaofan Zou, Zhuming Wang, Bowen Zhang, Lifang Wu, Zhizhi Guo, Zhixiang He","doi":"10.1109/CVPRW59228.2023.00676","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00676","url":null,"abstract":"In traditional scenes (short-distance applications), the current Face Anti-Spoofing (FAS) methods have achieved satisfactory performance. However, in surveillance scenes (long-distance applications), those methods cannot be generalized well due to the deviation in image quality. Some methods attempt to recover lost details from low-quality images through image reconstruction, but unknown image degradation results in suboptimal performance. In this paper, we regard image quality degradation as a domain generalization problem. Specifically, we propose an end-to-end Adversarial Domain Generalization Network (ADGN) to improve the generalization of FAS. We first divide the accessible training data into multiple sub-source domains based on image quality scores. Then, a feature extractor and a domain discriminator are trained to make the extracted features from different sub-source domains undistinguishable (i.e., quality-invariant features), thus forming an adversarial learning procedure. At the same time, we have introduced the transfer learning strategy to address the problem of insufficient training data. Our method won second place in \"Track Surveillance Face Anti-spoofing\" of the 4th Face Anti-spoofing Challenge@CVPR2023. Our final submission obtains 9.21% APCER, 1.90% BPCER, and 5.56% ACER, respectively.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121423059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Wang, Ping-Yang Chen, Yi-Kuan Hsieh, J. Hsieh, Ming-Ching Chang, JiaXin He, Shin-You Teng, HaoYuan Yue, Yu-Chee Tseng
{"title":"PRB-FPN+: Video Analytics for Enforcing Motorcycle Helmet Laws","authors":"Bo Wang, Ping-Yang Chen, Yi-Kuan Hsieh, J. Hsieh, Ming-Ching Chang, JiaXin He, Shin-You Teng, HaoYuan Yue, Yu-Chee Tseng","doi":"10.1109/CVPRW59228.2023.00579","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00579","url":null,"abstract":"We present a video analytic system for enforcing motorcycle helmet regulation as a participation to the AI City Challenge 2023 [18] Track 5 contest. The advert of powerful object detectors enables real-time localization of the road users and even the ability to determine if a motorcyclist or a rider is wearing a helmet. Ensuring road safety is important, as the helmets can effectively provide protection against severe injuries and fatalities. However, monitoring and enforcing helmet compliance is challenging, given the large number of motorcyclists and limited visual input such as occlusions. To address these challenges, we propose a novel two-step approach. First, we introduce the PRB-FPN+, a state-of-the-art detector that excels in object localization. We also explore the benefits of deep supervision by incorporating auxiliary heads within the network, leading to enhanced performance of our deep learning architectures. Second, we utilize an advanced tracker named SMILEtrack to associate and refine the target track-lets. Comprehensive experimental results demonstrate that the PRB-FPN+ outperforms the state-of-the-art detectors on MS-COCO. Our system achieved a remarkable rank of 8 on the AI City Challenge 2023 [18] Track 5 Public Leader-board. Code implementation is available at: https://github.com/NYCU-AICVLab/AICITY_2023_Track5.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114304771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}