IEEE transactions on pattern analysis and machine intelligence最新文献

筛选
英文 中文
NeMF: Neural Microphysics Fields. NeMF:神经微物理场。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-30 DOI: 10.1109/TPAMI.2024.3467913
Inbal Kom Betzer, Roi Ronen, Vadim Holodovsky, Yoav Y Schechner, Ilan Koren
{"title":"NeMF: Neural Microphysics Fields.","authors":"Inbal Kom Betzer, Roi Ronen, Vadim Holodovsky, Yoav Y Schechner, Ilan Koren","doi":"10.1109/TPAMI.2024.3467913","DOIUrl":"10.1109/TPAMI.2024.3467913","url":null,"abstract":"<p><p>Inverse problems in scientific imaging often seek physical characterization of heterogeneous scene materials. The scene is thus represented by physical quantities, such as the density and sizes of particles (microphysics) across a domain. Moreover, the forward image formation model is physical. An important case is that of clouds, where microphysics in three dimensions (3D) dictate the cloud dynamics, lifetime and albedo, with implications to Earth's energy balance, sustainable energy and rainfall. Current methods, however, recover very degenerate representations of microphysics. To enable 3D volumetric recovery of all the required microphysical parameters, we introduce the neural microphysics field (NeMF). It is based on a deep neural network, whose input is multi-view polarization images. NeMF is pre-trained through supervised learning. Training relies on polarized radiative transfer, and noise modeling in polarization-sensitive sensors. The results offer unprecedented recovery, including droplet effective variance. We test NeMF in rigorous simulations and demonstrate it using real-world polarization-image data.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tensor Coupled Learning of Incomplete Longitudinal Features and Labels for Clinical Score Regression. 用于临床评分回归的不完整纵向特征和标签的张量耦合学习。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-30 DOI: 10.1109/TPAMI.2024.3471800
Qing Xiao, Guiying Liu, Qianjin Feng, Yu Zhang, Zhenyuan Ning
{"title":"Tensor Coupled Learning of Incomplete Longitudinal Features and Labels for Clinical Score Regression.","authors":"Qing Xiao, Guiying Liu, Qianjin Feng, Yu Zhang, Zhenyuan Ning","doi":"10.1109/TPAMI.2024.3471800","DOIUrl":"10.1109/TPAMI.2024.3471800","url":null,"abstract":"<p><p>Longitudinal data with incomplete entries pose a significant challenge for clinical score regression over multiple time points. Although many methods primarily estimate longitudinal scores with complete baseline features (i.e., features collected at the initial time point), such snapshot features may overlook beneficial latent longitudinal traits for generalization. Alternatively, certain completion approaches (e.g., tensor decomposition technology) have been proposed to impute incomplete longitudinal data before score estimation, most of which, however, are transductive and cannot utilize label semantics. This work presents a tensor coupled learning (TCL) paradigm of incomplete longitudinal features and labels for clinical score regression. The TCL enjoys three advantages: 1) It drives semantic-aware factor matrices and collaboratively deals with incomplete longitudinal entries (of features and labels), during which a dynamic regularizer is designed for adaptive attribute selection. 2) It establishes a closed loop connecting baseline features and the coupled factor matrices, which enables inductive inference of longitudinal scores relying on only baseline features. 3) It reinforces the information encoding of baseline data by preserving the local manifold of longitudinal feature space and detecting the temporal alteration across multiple time points. Extensive experiments demonstrate the remarkable performance improvement of our method on clinical score regression with incomplete longitudinal data.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion. Z-Splat:用于相机-声纳融合的 Z 轴高斯拼接。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-23 DOI: 10.1109/TPAMI.2024.3462290
Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla
{"title":"Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion.","authors":"Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla","doi":"10.1109/TPAMI.2024.3462290","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3462290","url":null,"abstract":"<p><p>Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view (360° viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known 'missing cone' problem, which results in poor reconstruction along the depth axis. In this paper, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance).</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adverse Weather Optical Flow: Cumulative Homogeneous-Heterogeneous Adaptation. 恶劣天气光流:同质-异质累积适应。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-23 DOI: 10.1109/TPAMI.2024.3466241
Hanyu Zhou, Yi Chang, Zhiwei Shi, Wending Yan, Gang Chen, Yonghong Tian, Luxin Yan
{"title":"Adverse Weather Optical Flow: Cumulative Homogeneous-Heterogeneous Adaptation.","authors":"Hanyu Zhou, Yi Chang, Zhiwei Shi, Wending Yan, Gang Chen, Yonghong Tian, Luxin Yan","doi":"10.1109/TPAMI.2024.3466241","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3466241","url":null,"abstract":"<p><p>Optical flow has made great progress in clean scenes, while suffers degradation under adverse weather due to the violation of the brightness constancy and gradient continuity assumptions of optical flow. Typically, existing methods mainly adopt domain adaptation to transfer motion knowledge from clean to degraded domain through one-stage adaptation. However, this direct adaptation is ineffective, since there exists a large gap due to adverse weather and scene style between clean and real degraded domains. Moreover, even within the degraded domain itself, static weather (e.g., fog) and dynamic weather (e.g., rain) have different impacts on optical flow. To address above issues, we explore synthetic degraded domain as an intermediate bridge between clean and real degraded domains, and propose a cumulative homogeneous-heterogeneous adaptation framework for real adverse weather optical flow. Specifically, for clean-degraded transfer, our key insight is that static weather possesses the depth-association homogeneous feature which does not change the intrinsic motion of the scene, while dynamic weather additionally introduces the heterogeneous feature which results in a significant boundary discrepancy in warp errors between clean and degraded domains. For synthetic-real transfer, we figure out that cost volume correlation shares a similar statistical histogram between synthetic and real degraded domains, benefiting to holistically aligning the homogeneous correlation distribution for synthetic-real knowledge distillation. Under this unified framework, the proposed method can progressively and explicitly transfer knowledge from clean scenes to real adverse weather. In addition, we further collect a real adverse weather dataset with manually annotated optical flow labels and perform extensive experiments to verify the superiority of the proposed method. Both the code and the dataset will be available at https://github.com/hyzhouboy/CH2DA-Flow.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Relevance Learning Grassmann Quantization. 广义相关性学习格拉斯曼量化
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-23 DOI: 10.1109/TPAMI.2024.3466315
M Mohammadi, M Babai, M H F Wilkinson
{"title":"Generalized Relevance Learning Grassmann Quantization.","authors":"M Mohammadi, M Babai, M H F Wilkinson","doi":"10.1109/TPAMI.2024.3466315","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3466315","url":null,"abstract":"<p><p>Due to advancements in digital cameras, it is easy to gather multiple images (or videos) from an object under different conditions. Therefore, image-set classification has attracted more attention, and different solutions were proposed to model them. A popular way to model image sets is subspaces, which form a manifold called the Grassmann manifold. In this contribution, we extend the application of Generalized Relevance Learning Vector Quantization to deal with Grassmann manifold. The proposed model returns a set of prototype subspaces and a relevance vector. While prototypes model typical behaviours within classes, the relevance factors specify the most discriminative principal vectors (or images) for the classification task. They both provide insights into the model's decisions by highlighting influential images and pixels for predictions. Moreover, due to learning prototypes, the model complexity of the new method during inference is independent of dataset size, unlike previous works. We applied it to several recognition tasks including handwritten digit recognition, face recognition, activity recognition, and object recognition. Experiments demonstrate that it outperforms previous works with lower complexity and can successfully model the variation, such as handwritten style or lighting conditions. Moreover, the presence of relevances makes the model robust to the selection of subspaces' dimensionality.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating Per-Class Statistics for Label Noise Learning. 估算标签噪声学习的每类统计量
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-23 DOI: 10.1109/TPAMI.2024.3466182
Wenshui Luo, Shuo Chen, Tongliang Liu, Bo Han, Gang Niu, Masashi Sugiyama, Dacheng Tao, Chen Gong
{"title":"Estimating Per-Class Statistics for Label Noise Learning.","authors":"Wenshui Luo, Shuo Chen, Tongliang Liu, Bo Han, Gang Niu, Masashi Sugiyama, Dacheng Tao, Chen Gong","doi":"10.1109/TPAMI.2024.3466182","DOIUrl":"10.1109/TPAMI.2024.3466182","url":null,"abstract":"<p><p>Real-world data may contain a considerable amount of noisily labeled examples, which usually mislead the training algorithm and result in degraded classification performance on test data. Therefore, Label Noise Learning (LNL) was proposed, of which one popular research trend focused on estimating the critical statistics (e.g., sample mean and sample covariance), to recover the clean data distribution. However, existing methods may suffer from the unreliable sample selection process or can hardly be applied to multi-class cases. Inspired by the centroid estimation theory, we propose Per-Class Statistic Estimation (PCSE), which establishes the quantitative relationship between the clean (first-order and second-order) statistics and the corresponding noisy statistics for every class. This relationship is further utilized to induce a generative classifier for model inference. Unlike existing methods, our approach does not require sample selection from the instance level. Moreover, our PCSE can serve as a general post-processing strategy applicable to various popular networks pre-trained on the noisy dataset for boosting their classification performance. Theoretically, we prove that the estimated statistics converge to their ground-truth values as the sample size increases, even if the label transition matrix is biased. Empirically, we conducted intensive experiments on various binary and multi-class datasets, and the results demonstrate that PCSE achieves more precise statistic estimation as well as higher classification accuracy when compared with state-of-the-art methods in LNL.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test-time Training for Hyperspectral Image Super-resolution. 高光谱图像超分辨率的测试时间训练
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-23 DOI: 10.1109/TPAMI.2024.3461807
Ke Li, Luc Van Gool, Dengxin Dai
{"title":"Test-time Training for Hyperspectral Image Super-resolution.","authors":"Ke Li, Luc Van Gool, Dengxin Dai","doi":"10.1109/TPAMI.2024.3461807","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3461807","url":null,"abstract":"<p><p>The progress on Hyperspectral image (HSI) super-resolution (SR) is still lagging behind the research of RGB image SR. HSIs usually have a high number of spectral bands, so accurately modeling spectral band interaction for HSI SR is hard. Also, training data for HSI SR is hard to obtain so the dataset is usually rather small. In this work, we propose a new test-time training method to tackle this problem. Specifically, a novel self-training framework is developed, where more accurate pseudo-labels and more accurate LR-HR relationships are generated so that the model can be further trained with them to improve performance. In order to better support our test-time training method, we also propose a new network architecture to learn HSI SR without modeling spectral band interaction and propose a new data augmentation method Spectral Mixup to increase the diversity of the training data at test time. We also collect a new HSI dataset with a diverse set of images of interesting objects ranging from food to vegetation, to materials, and to general scenes. Extensive experiments on multiple datasets show that our method can improve the performance of pre-trained models significantly after test-time training and outperform competing methods significantly for HSI SR.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoVR-2: Automatic Data Construction for Composed Video Retrieval CoVR-2:用于合成视频检索的自动数据构建。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-20 DOI: 10.1109/TPAMI.2024.3463799
Lucas Ventura;Antoine Yang;Cordelia Schmid;Gül Varol
{"title":"CoVR-2: Automatic Data Construction for Composed Video Retrieval","authors":"Lucas Ventura;Antoine Yang;Cordelia Schmid;Gül Varol","doi":"10.1109/TPAMI.2024.3463799","DOIUrl":"10.1109/TPAMI.2024.3463799","url":null,"abstract":"Composed Image Retrieval (CoIR) has recently gained popularity as a task that considers \u0000<italic>both</i>\u0000 text and image queries together, to search for relevant images in a database. Most CoIR approaches require manually annotated datasets, comprising image-text-image triplets, where the text describes a modification from the query image to the target image. However, manual curation of CoIR \u0000<italic>triplets</i>\u0000 is expensive and prevents scalability. In this work, we instead propose a scalable automatic dataset creation methodology that generates triplets given video-caption \u0000<italic>pairs</i>\u0000, while also expanding the scope of the task to include Composed \u0000<italic>Video</i>\u0000 Retrieval (CoVR). To this end, we mine paired videos with a similar caption from a large database, and leverage a large language model to generate the corresponding modification text. Applying this methodology to the extensive WebVid2M collection, we automatically construct our WebVid-CoVR dataset, resulting in 1.6 million triplets. Moreover, we introduce a new benchmark for CoVR with a manually annotated evaluation set, along with baseline results. We further validate that our methodology is equally applicable to image-caption pairs, by generating 3.3 million CoIR training triplets using the Conceptual Captions dataset. Our model builds on BLIP-2 pretraining, adapting it to composed video (or image) retrieval, and incorporates an additional caption retrieval loss to exploit extra supervision beyond the triplet, which is possible since captions are readily available for our training data by design. We provide extensive ablations to analyze the design choices on our new CoVR benchmark. Our experiments also demonstrate that training a CoVR model on our datasets effectively transfers to CoIR, leading to improved state-of-the-art performance in the zero-shot setup on the CIRR, FashionIQ, and CIRCO benchmarks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142275195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continual Learning From a Stream of APIs 从应用程序接口流中不断学习
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-20 DOI: 10.1109/TPAMI.2024.3460871
Enneng Yang;Zhenyi Wang;Li Shen;Nan Yin;Tongliang Liu;Guibing Guo;Xingwei Wang;Dacheng Tao
{"title":"Continual Learning From a Stream of APIs","authors":"Enneng Yang;Zhenyi Wang;Li Shen;Nan Yin;Tongliang Liu;Guibing Guo;Xingwei Wang;Dacheng Tao","doi":"10.1109/TPAMI.2024.3460871","DOIUrl":"10.1109/TPAMI.2024.3460871","url":null,"abstract":"Continual learning (CL) aims to learn new tasks without forgetting previous tasks. However, existing CL methods require a large amount of raw data, which is often unavailable due to copyright considerations and privacy risks. Instead, stakeholders usually release pre-trained machine learning models as a service (MLaaS), which users can access via APIs. This paper considers two practical-yet-novel CL settings: data-efficient CL (DECL-APIs) and data-free CL (DFCL-APIs), which achieve CL from a stream of APIs with partial or no raw data. Performing CL under these two new settings faces several challenges: unavailable full raw data, unknown model parameters, heterogeneous models of arbitrary architecture and scale, and catastrophic forgetting of previous APIs. To overcome these issues, we propose a novel data-free cooperative continual distillation learning framework that distills knowledge from a stream of APIs into a CL model by generating pseudo data, just by querying APIs. Specifically, our framework includes two cooperative generators and one CL model, forming their training as an adversarial game. We first use the CL model and the current API as fixed discriminators to train generators via a derivative-free method. Generators adversarially generate hard and diverse synthetic data to maximize the response gap between the CL model and the API. Next, we train the CL model by minimizing the gap between the responses of the CL model and the black-box API on synthetic data, to transfer the API's knowledge to the CL model. Furthermore, we propose a new regularization term based on network similarity to prevent catastrophic forgetting of previous APIs. Our method performs comparably to classic CL with full raw data on the MNIST and SVHN datasets in the DFCL-APIs setting. In the DECL-APIs setting, our method achieves \u0000<inline-formula><tex-math>$0.97times$</tex-math></inline-formula>\u0000, \u0000<inline-formula><tex-math>$0.75times$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$0.69times$</tex-math></inline-formula>\u0000 performance of classic CL on the more challenging CIFAR10, CIFAR100, and MiniImageNet, respectively.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142275362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OOD-CV-v2 : An Extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images OOD-CV-v2:自然图像中单个干扰的分布外偏移鲁棒性扩展基准
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-17 DOI: 10.1109/TPAMI.2024.3462293
Bingchen Zhao;Jiahao Wang;Wufei Ma;Artur Jesslen;Siwei Yang;Shaozuo Yu;Oliver Zendel;Christian Theobalt;Alan L. Yuille;Adam Kortylewski
{"title":"OOD-CV-v2 : An Extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images","authors":"Bingchen Zhao;Jiahao Wang;Wufei Ma;Artur Jesslen;Siwei Yang;Shaozuo Yu;Oliver Zendel;Christian Theobalt;Alan L. Yuille;Adam Kortylewski","doi":"10.1109/TPAMI.2024.3462293","DOIUrl":"10.1109/TPAMI.2024.3462293","url":null,"abstract":"Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV-v2, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking of models for image classification, object detection, and 3D pose estimation. In addition to this novel dataset, we contribute extensive experiments using popular baseline methods, which reveal that: 1) Some nuisance factors have a much stronger negative effect on the performance compared to others, also depending on the vision task. 2) Current approaches to enhance robustness have only marginal effects, and can even reduce robustness. 3) We do not observe significant differences between convolutional and transformer architectures. We believe our dataset provides a rich test bed to study robustness and will help push forward research in this area.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142236402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信