International Journal of Computer Vision最新文献

筛选
英文 中文
WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation WeakCLIP:针对弱监督语义分割调整 CLIP
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-09-05 DOI: 10.1007/s11263-024-02224-2
Lianghui Zhu, Xinggang Wang, Jiapei Feng, Tianheng Cheng, Yingyue Li, Bo Jiang, Dingwen Zhang, Junwei Han
{"title":"WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation","authors":"Lianghui Zhu, Xinggang Wang, Jiapei Feng, Tianheng Cheng, Yingyue Li, Bo Jiang, Dingwen Zhang, Junwei Han","doi":"10.1007/s11263-024-02224-2","DOIUrl":"https://doi.org/10.1007/s11263-024-02224-2","url":null,"abstract":"<p>Contrastive language and image pre-training (CLIP) achieves great success in various computer vision tasks and also presents an opportune avenue for enhancing weakly-supervised image understanding with its large-scale pre-trained knowledge. As an effective way to reduce the reliance on pixel-level human-annotated labels, weakly-supervised semantic segmentation (WSSS) aims to refine the class activation map (CAM) and produce high-quality pseudo masks. Weakly-supervised semantic segmentation (WSSS) aims to refine the class activation map (CAM) as pseudo masks, but heavily relies on inductive biases like hand-crafted priors and digital image processing methods. For the vision-language pre-trained model, i.e. CLIP, we propose a novel text-to-pixel matching paradigm for WSSS. However, directly applying CLIP to WSSS is challenging due to three critical problems: (1) the task gap between contrastive pre-training and WSSS CAM refinement, (2) lacking text-to-pixel modeling to fully utilize the pre-trained knowledge, and (3) the insufficient details owning to the <span>(frac{1}{16})</span> down-sampling resolution of ViT. Thus, we propose WeakCLIP to address the problems and leverage the pre-trained knowledge from CLIP to WSSS. Specifically, we first address the task gap by proposing a pyramid adapter and learnable prompts to extract WSSS-specific representation. We then design a co-attention matching module to model text-to-pixel relationships. Finally, the pyramid adapter and text-guided decoder are introduced to gather multi-level information and integrate it with text guidance hierarchically. WeakCLIP provides an effective and parameter-efficient way to transfer CLIP knowledge to refine CAM. Extensive experiments demonstrate that WeakCLIP achieves the state-of-the-art WSSS performance on standard benchmarks, i.e., 74.0% mIoU on the <i>val</i> set of PASCAL VOC 2012 and 46.1% mIoU on the <i>val</i> set of COCO 2014. The source code and model checkpoints are released at https://github.com/hustvl/WeakCLIP.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"21 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142138035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continual Face Forgery Detection via Historical Distribution Preserving 通过历史分布保存进行连续人脸伪造检测
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-09-04 DOI: 10.1007/s11263-024-02160-1
Ke Sun, Shen Chen, Taiping Yao, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji
{"title":"Continual Face Forgery Detection via Historical Distribution Preserving","authors":"Ke Sun, Shen Chen, Taiping Yao, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji","doi":"10.1007/s11263-024-02160-1","DOIUrl":"https://doi.org/10.1007/s11263-024-02160-1","url":null,"abstract":"<p>Face forgery techniques have advanced rapidly and pose serious security threats. Existing face forgery detection methods try to learn generalizable features, but they still fall short of practical application. Additionally, finetuning these methods on historical training data is resource-intensive in terms of time and storage. In this paper, we focus on a novel and challenging problem: Continual Face Forgery Detection (CFFD), which aims to efficiently learn from new forgery attacks without forgetting previous ones. Specifically, we propose a Historical Distribution Preserving (HDP) framework that reserves and preserves the distributions of historical faces. To achieve this, we use universal adversarial perturbation (UAP) to simulate historical forgery distribution, and knowledge distillation to maintain the distribution variation of real faces across different models. We also construct a new benchmark for CFFD with three evaluation protocols. Our extensive experiments on the benchmarks show that our method outperforms the state-of-the-art competitors. Our code is available at https://github.com/skJack/HDP.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"20 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142131051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Fuzzy Positive Learning for Annotation-Scarce Semantic Segmentation 用于无注释语义分割的自适应模糊正向学习
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-09-02 DOI: 10.1007/s11263-024-02217-1
Pengchong Qiao, Yu Wang, Chang Liu, Lei Shang, Baigui Sun, Zhennan Wang, Xiawu Zheng, Rongrong Ji, Jie Chen
{"title":"Adaptive Fuzzy Positive Learning for Annotation-Scarce Semantic Segmentation","authors":"Pengchong Qiao, Yu Wang, Chang Liu, Lei Shang, Baigui Sun, Zhennan Wang, Xiawu Zheng, Rongrong Ji, Jie Chen","doi":"10.1007/s11263-024-02217-1","DOIUrl":"https://doi.org/10.1007/s11263-024-02217-1","url":null,"abstract":"<p>Annotation-scarce semantic segmentation aims to obtain meaningful pixel-level discrimination with scarce or even no manual annotations, of which the crux is how to utilize unlabeled data by pseudo-label learning. Typical works focus on ameliorating the error-prone pseudo-labeling, e.g., only utilizing high-confidence pseudo labels and filtering low-confidence ones out. But we think differently and resort to exhausting informative semantics from multiple probably correct candidate labels. This brings our method the ability to learn more accurately even though pseudo labels are unreliable. In this paper, we propose Adaptive Fuzzy Positive Learning (A-FPL) for correctly learning unlabeled data in a plug-and-play fashion, targeting adaptively encouraging fuzzy positive predictions and suppressing highly probable negatives. Specifically, A-FPL comprises two main components: (1) Fuzzy positive assignment (FPA) that adaptively assigns fuzzy positive labels to each pixel, while ensuring their quality through a T-value adaption algorithm (2) Fuzzy positive regularization (FPR) that restricts the predictions of fuzzy positive categories to be larger than those of negative categories. Being conceptually simple yet practically effective, A-FPL remarkably alleviates interference from wrong pseudo labels, progressively refining semantic discrimination. Theoretical analysis and extensive experiments on various training settings with consistent performance gain justify the superiority of our approach. Codes are at A-FPL.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142123587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need 重新审视使用预训练模型的分类增量学习:通用性和适应性是你所需要的一切
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-08-31 DOI: 10.1007/s11263-024-02218-0
Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu
{"title":"Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need","authors":"Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu","doi":"10.1007/s11263-024-02218-0","DOIUrl":"https://doi.org/10.1007/s11263-024-02218-0","url":null,"abstract":"<p>Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones. Traditional CIL models are trained from scratch to continually acquire knowledge as data evolves. Recently, pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL. Contrary to traditional methods, PTMs possess generalizable embeddings, which can be easily transferred for CIL. In this work, we revisit CIL with PTMs and argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring. (1) We first reveal that frozen PTM can already provide generalizable embeddings for CIL. Surprisingly, a simple baseline (SimpleCIL) which continually sets the classifiers of PTM to prototype features can beat state-of-the-art even without training on the downstream task. (2) Due to the distribution gap between pre-trained and downstream datasets, PTM can be further cultivated with adaptivity via model adaptation. We propose AdaPt and mERge (<span>Aper</span>), which aggregates the embeddings of PTM and adapted models for classifier construction. <span>Aper </span>is a general framework that can be orthogonally combined with any parameter-efficient tuning method, which holds the advantages of PTM’s generalizability and adapted model’s adaptivity. (3) Additionally, considering previous ImageNet-based benchmarks are unsuitable in the era of PTM due to data overlapping, we propose four new benchmarks for assessment, namely ImageNet-A, ObjectNet, OmniBenchmark, and VTAB. Extensive experiments validate the effectiveness of <span>Aper </span>with a unified and concise framework. Code is available at https://github.com/zhoudw-zdw/RevisitingCIL.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"20 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic Evaluation of Uncertainty Calibration in Pretrained Object Detectors 预训练物体探测器不确定性校准的系统性评估
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-08-31 DOI: 10.1007/s11263-024-02219-z
Denis Huseljic, Marek Herde, Paul Hahn, Mehmet Müjde, Bernhard Sick
{"title":"Systematic Evaluation of Uncertainty Calibration in Pretrained Object Detectors","authors":"Denis Huseljic, Marek Herde, Paul Hahn, Mehmet Müjde, Bernhard Sick","doi":"10.1007/s11263-024-02219-z","DOIUrl":"https://doi.org/10.1007/s11263-024-02219-z","url":null,"abstract":"<p>In the field of deep learning based computer vision, the development of deep object detection has led to unique paradigms (e.g., two-stage or set-based) and architectures (e.g., <span>Faster-RCNN</span> or <span>DETR</span>) which enable outstanding performance on challenging benchmark datasets. Despite this, the trained object detectors typically do not reliably assess uncertainty regarding their own knowledge, and the quality of their probabilistic predictions is usually poor. As these are often used to make subsequent decisions, such inaccurate probabilistic predictions must be avoided. In this work, we investigate the uncertainty calibration properties of different pretrained object detection architectures in a multi-class setting. We propose a framework to ensure a fair, unbiased, and repeatable evaluation and conduct detailed analyses assessing the calibration under distributional changes (e.g., distributional shift and application to out-of-distribution data). Furthermore, by investigating the influence of different detector paradigms, post-processing steps, and suitable choices of metrics, we deliver novel insights into why poor detector calibration emerges. Based on these insights, we are able to improve the calibration of a detector by simply finetuning its last layer.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"146 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight High-Speed Photography Built on Coded Exposure and Implicit Neural Representation of Videos 基于编码曝光和视频内隐神经表征的轻量级高速摄影技术
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-08-30 DOI: 10.1007/s11263-024-02198-1
Zhihong Zhang, Runzhao Yang, Jinli Suo, Yuxiao Cheng, Qionghai Dai
{"title":"Lightweight High-Speed Photography Built on Coded Exposure and Implicit Neural Representation of Videos","authors":"Zhihong Zhang, Runzhao Yang, Jinli Suo, Yuxiao Cheng, Qionghai Dai","doi":"10.1007/s11263-024-02198-1","DOIUrl":"https://doi.org/10.1007/s11263-024-02198-1","url":null,"abstract":"<p>The demand for compact cameras capable of recording high-speed scenes with high resolution is steadily increasing. However, achieving such capabilities often entails high bandwidth requirements, resulting in bulky, heavy systems unsuitable for low-capacity platforms. To address this challenge, leveraging a coded exposure setup to encode a frame sequence into a blurry snapshot and subsequently retrieve the latent sharp video presents a lightweight solution. Nevertheless, restoring motion from blur remains a formidable challenge due to the inherent ill-posedness of motion blur decomposition, the intrinsic ambiguity in motion direction, and the diverse motions present in natural videos. In this study, we propose a novel approach to address these challenges by combining the classical coded exposure imaging technique with the emerging implicit neural representation for videos. We strategically embed motion direction cues into the blurry image during the imaging process. Additionally, we develop a novel implicit neural representation based blur decomposition network to sequentially extract the latent video frames from the blurry image, leveraging the embedded motion direction cues. To validate the effectiveness and efficiency of our proposed framework, we conduct extensive experiments using benchmark datasets and real-captured blurry images. The results demonstrate that our approach significantly outperforms existing methods in terms of both quality and flexibility. The code for our work is available at https://github.com/zhihongz/BDINR.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"55 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning General and Specific Embedding with Transformer for Few-Shot Object Detection 利用变换器学习通用和特定嵌入,实现少镜头物体检测
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-08-28 DOI: 10.1007/s11263-024-02199-0
Xu Zhang, Zhe Chen, Jing Zhang, Tongliang Liu, Dacheng Tao
{"title":"Learning General and Specific Embedding with Transformer for Few-Shot Object Detection","authors":"Xu Zhang, Zhe Chen, Jing Zhang, Tongliang Liu, Dacheng Tao","doi":"10.1007/s11263-024-02199-0","DOIUrl":"https://doi.org/10.1007/s11263-024-02199-0","url":null,"abstract":"<p>Few-shot object detection (FSOD) studies how to detect novel objects with few annotated examples effectively. Recently, it has been demonstrated that decent feature embeddings, including the general feature embeddings that are more invariant to visual changes and the specific feature embeddings that are more discriminative for different object classes, are both important for FSOD. However, current methods lack appropriate mechanisms to sensibly cooperate both types of feature embeddings based on their importance to detecting objects of novel classes, which may result in sub-optimal performance. In this paper, to achieve more effective FSOD, we attempt to explicitly encode both general and specific feature embeddings using learnable tensors and apply a Transformer to help better incorporate them in FSOD according to their relations to the input object features. We thus propose a Transformer-based general and specific embedding learning (T-GSEL) method for FSOD. In T-GSEL, learnable tensors are employed in a three-stage pipeline, encoding feature embeddings in general level, intermediate level, and specific level, respectively. In each stage, we apply a Transformer to first model the relations of the corresponding embedding to input object features and then apply the estimated relations to refine the input features. Meanwhile, we further introduce cross-stage connections between embeddings of different stages to make them complement and cooperate with each other, delivering general, intermediate, and specific feature embeddings stage by stage and utilizing them together for feature refinement in FSOD. In practice, a T-GSEL module is easy to inject. Extensive empirical results further show that our proposed T-GSEL method achieves compelling FSOD performance on both PASCAL VOC and MS COCO datasets compared with other state-of-the-art approaches.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"5 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142085578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing 利用梯度转移学习长尾分布下的盒式回归和掩码分割
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-08-28 DOI: 10.1007/s11263-024-02104-9
Tao Wang, Li Yuan, Xinchao Wang, Jiashi Feng
{"title":"Learning Box Regression and Mask Segmentation Under Long-Tailed Distribution with Gradient Transfusing","authors":"Tao Wang, Li Yuan, Xinchao Wang, Jiashi Feng","doi":"10.1007/s11263-024-02104-9","DOIUrl":"https://doi.org/10.1007/s11263-024-02104-9","url":null,"abstract":"<p>Learning object detectors under long-tailed data distribution is challenging and has been widely studied recently, the prior works mainly focus on balancing the learning signal of classification task such that samples from tail object classes are effectively recognized. However, the learning difficulty of other class-wise tasks including bounding box regression and mask segmentation are not explored before. In this work, we investigate how long-tailed distribution affects the optimization of box regression and mask segmentation tasks. We find that although the standard class-wise box regression and mask segmentation offer strong class-specific prediction, they suffer from limited training signal and instability on the tail object classes. Aiming to address the limitation, our insight is that the knowledge of box regression and object segmentation is naturally shared across classes. We thus develop a cross class gradient transfusing (CRAT) approach to transfer the abundant training signal from head classes to help the training of sample-scarce tail classes. The transferring process is guided by the Fisher information to aggregate useful signals. CRAT can be seamlessly integrated into existing end-to-end or decoupled long-tailed object detection pipelines to robustly learn class-wise box regression and mask segmentation under long-tailed distribution. Our method improves the state-of-the-art long-tailed object detection and instance segmentation models with an average of 3.0 tail AP on the LVIS benchmark. The code implementation will be available at https://github.com/twangnh/CRAT</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"5 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142085526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AROID: Improving Adversarial Robustness Through Online Instance-Wise Data Augmentation AROID:通过在线实例数据增强提高对抗鲁棒性
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-08-24 DOI: 10.1007/s11263-024-02206-4
Lin Li, Jianing Qiu, Michael Spratling
{"title":"AROID: Improving Adversarial Robustness Through Online Instance-Wise Data Augmentation","authors":"Lin Li, Jianing Qiu, Michael Spratling","doi":"10.1007/s11263-024-02206-4","DOIUrl":"https://doi.org/10.1007/s11263-024-02206-4","url":null,"abstract":"<p>Deep neural networks are vulnerable to adversarial examples. Adversarial training (AT) is an effective defense against adversarial examples. However, AT is prone to overfitting which degrades robustness substantially. Recently, data augmentation (DA) was shown to be effective in mitigating robust overfitting if appropriately designed and optimized for AT. This work proposes a new method to automatically learn online, instance-wise, DA policies to improve robust generalization for AT. This is the first automated DA method specific for robustness. A novel policy learning objective, consisting of Vulnerability, Affinity and Diversity, is proposed and shown to be sufficiently effective and efficient to be practical for automatic DA generation during AT. Importantly, our method dramatically reduces the cost of policy search from the 5000 h of AutoAugment and the 412 h of IDBH to 9 h, making automated DA more practical to use for adversarial robustness. This allows our method to efficiently explore a large search space for a more effective DA policy and evolve the policy as training progresses. Empirically, our method is shown to outperform all competitive DA methods across various model architectures and datasets. Our DA policy reinforced vanilla AT to surpass several state-of-the-art AT methods regarding both accuracy and robustness. It can also be combined with those advanced AT methods to further boost robustness. Code and pre-trained models are available at: https://github.com/TreeLLi/AROID.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"25 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142085114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
R $$^{2}$$ S100K: Road-Region Segmentation Dataset for Semi-supervised Autonomous Driving in the Wild R $$^{2}$$ S100K:用于野外半监督自动驾驶的道路区域划分数据集
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-08-23 DOI: 10.1007/s11263-024-02207-3
Muhammad Atif Butt, Hassan Ali, Adnan Qayyum, Waqas Sultani, Ala Al-Fuqaha, Junaid Qadir
{"title":"R $$^{2}$$ S100K: Road-Region Segmentation Dataset for Semi-supervised Autonomous Driving in the Wild","authors":"Muhammad Atif Butt, Hassan Ali, Adnan Qayyum, Waqas Sultani, Ala Al-Fuqaha, Junaid Qadir","doi":"10.1007/s11263-024-02207-3","DOIUrl":"https://doi.org/10.1007/s11263-024-02207-3","url":null,"abstract":"<p>Semantic understanding of roadways is a key enabling factor for safe autonomous driving. However, existing autonomous driving datasets provide well-structured urban roads while ignoring unstructured roadways containing distress, potholes, water puddles, and various kinds of road patches i.e., earthen, gravel etc. To this end, we introduce Road Region Segmentation dataset (R<sup>2</sup>S100K)—a large-scale dataset and benchmark for training and evaluation of road segmentation in aforementioned challenging unstructured roadways. R<sup>2</sup>S100K comprises 100K images extracted from a large and diverse set of video sequences covering more than 1000 km of roadways. Out of these 100K privacy respecting images, 14,000 images have fine pixel-labeling of road regions, with 86,000 unlabeled images that can be leveraged through semi-supervised learning methods. Alongside, we present an Efficient Data Sampling based self-training framework to improve learning by leveraging unlabeled data. Our experimental results demonstrate that the proposed method significantly improves learning methods in generalizability and reduces the labeling cost for semantic segmentation tasks. Our benchmark will be publicly available to facilitate future research at https://r2s100k.github.io/.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"21 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142045651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信