2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)最新文献_第9页

CheckSORT: Refined Synthetic Data Combination and Optimized SORT for Automatic Retail Checkout CheckSORT:精细化的综合数据组合和优化的自动零售结账排序

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00569

Ziqiang Shi, Zhongling Liu, Liu Liu, Rujie Liu, Takuma Yamamoto, Xiaoyue Mi, Daisuke Uchida

{"title":"CheckSORT: Refined Synthetic Data Combination and Optimized SORT for Automatic Retail Checkout","authors":"Ziqiang Shi, Zhongling Liu, Liu Liu, Rujie Liu, Takuma Yamamoto, Xiaoyue Mi, Daisuke Uchida","doi":"10.1109/CVPRW59228.2023.00569","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00569","url":null,"abstract":"In this paper, we propose a method called CheckSORT for automatic retail checkout. We demonstrate CheckSORT on the multi-class product counting and recognition task in Track 4 of AI CITY CHALLENGE 2023. This task aims to count and identify products as they move along a retail checkout white tray, which is challenging due to occlusion, similar appearance, or blur. Based on the constraints and training data provided by the sponsor, we propose two new ideas to solve this task. The first idea is to design a controllable synthetic training data generation paradigm to bridge the gap between training data and real test videos as much as possible. The second innovation is to improve the efficiency of existing SORT tracking algorithms by proposing decomposed Kalman filter and dynamic tracklet feature sequence. Our experiments resulted in state-of-the-art (when compared with DeepSORT and StrongSORT) F1-scores of 70.3% and 62.1% on the TestA data of AI CITY CHALLENGE 2022 and 2023 respectively in the estimation of the time (in seconds) for the product to appear on the tray. Training and testing code will be available soon on github.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"281 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133044417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Single Residual Network with ESA Modules and Distillation 带有ESA模块和蒸馏的单残留网络

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00191

Yucong Wang, Minjie Cai

{"title":"A Single Residual Network with ESA Modules and Distillation","authors":"Yucong Wang, Minjie Cai","doi":"10.1109/CVPRW59228.2023.00191","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00191","url":null,"abstract":"Although there are many methods based on deep learning that have superior performance on single image super-resolution (SISR), it is difficult to run in real time on devices with limited computing power. Some recent studies have found that simply relying on reducing parameters or reducing the theoretical FLOPs of the model does not speed up the inference time of the network in a practical sense. Actual speed on the device is probably a better measure than FLOPs. In this work, we propose a new single residual network (SRN). On the one hand, we try to introduce and optimize an attention mechanism module to improve the performance of the network with a relatively small speed loss. On the other hand, we find that residuals in residual blocks do not have a positive impact on networks with adjusted ESA. Therefore, the residual of the network residual block is removed, which not only improves the speed of the network, but also improves the performance of the network. Finally, we reduced the number of channels and the number of residual blocks of the classic model EDSR, and removed the last convolution before the long residual. We set this tuned EDSR as the teacher model and our newly proposed SRN as the student model. Under the joint effect of the original loss and the distillation loss, the performance of the network can be improved without losing the inference time. Combining the above strategies, our proposed model runs much faster than similarly performing models. As an example, we built a Fast and Efficient Network (SRN) and its small version SRN-S, which run 30%-37% faster than the state-of-the-art EISR model: a paper champion RLFN. Furthermore, the shallow version of SRN-S achieves the second-shortest inference time as well as the second-smallest number of activations in the NTIRE2023 challenge. Code will be available at https://github.com/wnxbwyc/SRN.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114414428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Weakly Supervised Visual Question Answer Generation 弱监督视觉问答生成

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00591

Charani Alampalle, Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty

{"title":"Weakly Supervised Visual Question Answer Generation","authors":"Charani Alampalle, Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty","doi":"10.1109/CVPRW59228.2023.00591","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00591","url":null,"abstract":"Growing interest in conversational agents promote two-way human-computer communications involving asking and answering visual questions have become an active area of research in AI. Thus, generation of visual question-answer pair(s) becomes an important and challenging task. To address this issue, we propose a weakly-supervised visual question answer generation method that generates a relevant question-answer pairs for a given input image and associated caption. Most of the prior works are supervised and depend on the annotated question-answer datasets. In our work, we present a weakly supervised method that synthetically generates question-answer pairs procedurally from visual information and captions. The proposed method initially extracts list of answer words, then does nearest question generation that uses the caption and answer word to generate synthetic question. Next, the relevant question generator converts the nearest question to relevant language question by dependency parsing and in-order tree traversal, finally, fine-tune a ViLBERT model with the question-answer pair(s) generated at end. We perform an exhaustive experimental analysis on VQA dataset and see that our model significantly outperform SOTA methods on BLEU scores. We also show the results wrt baseline models and ablation study.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114743755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NTIRE 2023 HR NonHomogeneous Dehazing Challenge Report 2023年HR非均匀除雾挑战报告

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00180

C. Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, R. Timofte, Han Zhou, Wei Dong, Yangyi Liu, Jun Chen, Huan Liu, Liangyan Li, Zijun Wu, Yubo Dong, Yuyang Li, Tian Qiu, Yuying He, Yonghong Lu, Yinwei Wu, Zhenxiang Jiang, Songhua Liu, Xingyi Yang, Yongcheng Jing, Bilel Benjdira, Anas M. Ali, A. Koubâa, Hao-Hsiang Yang, I-Hsiang Chen, Wei-Ting Chen, Zhi-Kai Huang, Yi-Chung Chen, Chia-Hsuan Hsieh, Hua-En Chang, Yuan Chiang, Sy-Yen Kuo, Yu Guo, Yuan Gao, R. Liu, Yuxu Lu, Jingxiang Qu, Shengfeng He, Wenqi Ren, Trung Hoang, Haichuan Zhang, Amirsaeed Yazdani, V. Monga, Lehan Yang, Alex Wu, Tiancheng Mai, Xiaofeng Cong, Xuemeng Yin, Xuefei Yin, Hazim Emad, Ahmed Abdallah, Y. Yasser, Dalia Elshahat, Esraa Elbaz, Zhan-ying Li, Wenqing Kuang, Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sjölund, Thomas Bo Schön, Zhao Zhang, Yanyan Wei, Junhu Wang, Suiyi Zhao, Huan Zheng, Jinkang Guo, Ya Sun, T. Liu, D. Hao, Kui Jiang, Anjali Sarvaiya, Kalpesh P. Prajapati, R. Patra, Pragnesh Barik, C. Rathod, Kishor P. Upl

{"title":"NTIRE 2023 HR NonHomogeneous Dehazing Challenge Report","authors":"C. Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, R. Timofte, Han Zhou, Wei Dong, Yangyi Liu, Jun Chen, Huan Liu, Liangyan Li, Zijun Wu, Yubo Dong, Yuyang Li, Tian Qiu, Yuying He, Yonghong Lu, Yinwei Wu, Zhenxiang Jiang, Songhua Liu, Xingyi Yang, Yongcheng Jing, Bilel Benjdira, Anas M. Ali, A. Koubâa, Hao-Hsiang Yang, I-Hsiang Chen, Wei-Ting Chen, Zhi-Kai Huang, Yi-Chung Chen, Chia-Hsuan Hsieh, Hua-En Chang, Yuan Chiang, Sy-Yen Kuo, Yu Guo, Yuan Gao, R. Liu, Yuxu Lu, Jingxiang Qu, Shengfeng He, Wenqi Ren, Trung Hoang, Haichuan Zhang, Amirsaeed Yazdani, V. Monga, Lehan Yang, Alex Wu, Tiancheng Mai, Xiaofeng Cong, Xuemeng Yin, Xuefei Yin, Hazim Emad, Ahmed Abdallah, Y. Yasser, Dalia Elshahat, Esraa Elbaz, Zhan-ying Li, Wenqing Kuang, Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sjölund, Thomas Bo Schön, Zhao Zhang, Yanyan Wei, Junhu Wang, Suiyi Zhao, Huan Zheng, Jinkang Guo, Ya Sun, T. Liu, D. Hao, Kui Jiang, Anjali Sarvaiya, Kalpesh P. Prajapati, R. Patra, Pragnesh Barik, C. Rathod, Kishor P. Upl","doi":"10.1109/CVPRW59228.2023.00180","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00180","url":null,"abstract":"This study assesses the outcomes of the NTIRE 2023 Challenge on Non-Homogeneous Dehazing, wherein novel techniques were proposed and evaluated on new image dataset called HD-NH-HAZE. The HD-NH-HAZE dataset contains 50 high resolution pairs of real-life outdoor images featuring nonhomogeneous hazy images and corresponding haze-free images of the same scene. The nonhomogeneous haze was simulated using a professional setup that replicated real-world conditions of hazy scenarios. The competition had 246 participants and 17 teams submitted solutions for the final testing phase. The proposed solutions demonstrated the cutting-edge in image dehazing technology.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115734350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Improving Deep Learning-based Automatic Checkout System Using Image Enhancement Techniques 利用图像增强技术改进基于深度学习的自动结帐系统

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00562

L. Pham, Duong Nguyen-Ngoc Tran, Huy-Hung Nguyen, Hyung-Joon Jeon, T. H. Tran, Hyung-Min Jeon, Jaewook Jeon

{"title":"Improving Deep Learning-based Automatic Checkout System Using Image Enhancement Techniques","authors":"L. Pham, Duong Nguyen-Ngoc Tran, Huy-Hung Nguyen, Hyung-Joon Jeon, T. H. Tran, Hyung-Min Jeon, Jaewook Jeon","doi":"10.1109/CVPRW59228.2023.00562","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00562","url":null,"abstract":"The retail sector has experienced significant growth in artificial intelligence and computer vision applications, particularly with the emergence of automatic checkout (ACO) systems in stores and supermarkets. ACO systems encounter challenges such as object occlusion, motion blur, and similarity between scanned items while acquiring accurate training images for realistic checkout scenarios is difficult due to constant product updates. This paper improves existing deep learning-based ACO solutions by incorporating several image enhancement techniques in the data pre-processing step. The proposed ACO system employs a detect-and-track strategy, which involves: (1) detecting objects in areas of interest; (2) tracking objects in consecutive frames; and (3) counting objects using a track management pipeline. Several data generation techniques—including copy-and-paste, random placement, and augmentation—are employed to create diverse training data. Additionally, the proposed solution is designed as an open-ended framework that can be easily expanded to accommodate multiple tasks. The system has been evaluated on the AI City Challenge 2023 Track 4 dataset, showcasing outstanding performance by achieving a top-1 ranking on test-set A with an F1 score of 0.9792.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116057528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Deep Dehazing Powered by Image Processing Network 由图像处理网络驱动的深度除雾

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00128

Guisik Kim, Jin-Hyeong Park, Junseok Kwon

引用次数: 0

Detail-Preserving Self-Supervised Monocular Depth with Self-Supervised Structural Sharpening 基于自监督结构锐化的保细节自监督单目深度

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00031

J. Bello, Jaeho Moon, Munchurl Kim

{"title":"Detail-Preserving Self-Supervised Monocular Depth with Self-Supervised Structural Sharpening","authors":"J. Bello, Jaeho Moon, Munchurl Kim","doi":"10.1109/CVPRW59228.2023.00031","DOIUrl":"https://doi.org/10.1109/CVPRW59228.2023.00031","url":null,"abstract":"We propose to further close the gap between self-supervised and fully-supervised methods for the single view depth estimation (SVDE) task in terms of the levels of detail and sharpness in the estimated depth maps. Detailed SVDE is challenging as even fully-supervised methods struggle to obtain detail-preserving depth estimates. While recent works have proposed exploiting semantic masks to improve the structural information in the estimated depth maps, our proposed method yields detail-preserving depth estimates from a single forward pass without increasing the computational cost or requiring additional data. We achieve this by exploiting a missing component in SVDE, Self-Supervised Structural Sharpening, referred to as S4. S4 is a mechanism that encourages a similar level of detail between the RGB input and the depth/disparity output. To this extent, we propose a novel DispNet-S4 network for detail-preserving SVDE. Our network exploits un-blurring and un-noising tasks of clean input images for learning S4 without the need for either additional data (e.g., segmentation masks, matting maps, etc.) or advanced network blocks (attention, transformers, etc.). The recovered structural details in the un-blurring and un-noising operations are transferred to the estimated depth maps via adaptive convolutions to yield structurally sharpened depths that are selectively used for self-supervision. We provide extensive experimental results and ablation studies that show our proposed DispNetS4 network can yield fine details in the depth maps while achieving quantitative metrics comparable to the state-of-the-art for the challenging KITTI dataset.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123252635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GAN-based Vision Transformer for High-Quality Thermal Image Enhancement 基于gan的高质量热图像增强视觉变压器

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00089

M. Marnissi, A. Fathallah

引用次数: 0

Triplet Temporal-based Video Recognition with Multiview for Temporal Action Localization 基于时间的多视点三联体视频识别，用于时间动作定位

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00573

Huy Duong Le, Minh Quan Vu, Manh Tung Tran, Nguyen Van Phuc

引用次数: 1

Difficulty Estimation with Action Scores for Computer Vision Tasks 基于动作分数的计算机视觉任务难度估计

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI: 10.1109/CVPRW59228.2023.00030

Octavio Arriaga, Sebastián M. Palacio, Matias Valdenegro-Toro

引用次数: 0