IEEE transactions on pattern analysis and machine intelligence最新文献

筛选
英文 中文
A Novel and Effective Method to Directly Solve Spectral Clustering 直接解决频谱聚类问题的新颖有效方法
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-21 DOI: 10.1109/TPAMI.2024.3447287
Feiping Nie;Chaodie Liu;Rong Wang;Xuelong Li
{"title":"A Novel and Effective Method to Directly Solve Spectral Clustering","authors":"Feiping Nie;Chaodie Liu;Rong Wang;Xuelong Li","doi":"10.1109/TPAMI.2024.3447287","DOIUrl":"10.1109/TPAMI.2024.3447287","url":null,"abstract":"Spectral clustering has been attracting increasing attention due to its well-defined framework and excellent performance. However, most traditional spectral clustering methods consist of two separate steps: 1) Solving a relaxed optimization problem to learn the continuous clustering labels, and 2) Rounding the continuous clustering labels into discrete ones. The clustering results of the relax-and-discretize strategy inevitably result in information loss and unsatisfactory clustering performance. Moreover, the similarity matrix constructed from original data may not be optimal for clustering since data usually have noise and redundancy. To address these problems, we propose a novel and effective algorithm to directly optimize the original spectral clustering model, called Direct Spectral Clustering (DSC). We theoretically prove that the original spectral clustering model can be solved by simultaneously learning a weighted discrete indicator matrix and a structured similarity matrix whose connected components are equal to the number of clusters. Both of them can be used to directly obtain the final clustering results without any post-processing. Further, an effective iterative optimization algorithm is exploited to solve the proposed method. Extensive experiments performed on synthetic and real-world datasets demonstrate the superiority and effectiveness of the proposed method compared to the state-of-the-art algorithms.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10863-10875"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CO-Net++: A Cohesive Network for Multiple Point Cloud Tasks at Once With Two-Stage Feature Rectification CO-Net++:一次完成多个点云任务的内聚网络,带两阶段特征校正。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-21 DOI: 10.1109/TPAMI.2024.3447008
Tao Xie;Kun Dai;Qihao Sun;Zhiqiang Jiang;Chuqing Cao;Lijun Zhao;Ke Wang;Ruifeng Li
{"title":"CO-Net++: A Cohesive Network for Multiple Point Cloud Tasks at Once With Two-Stage Feature Rectification","authors":"Tao Xie;Kun Dai;Qihao Sun;Zhiqiang Jiang;Chuqing Cao;Lijun Zhao;Ke Wang;Ruifeng Li","doi":"10.1109/TPAMI.2024.3447008","DOIUrl":"10.1109/TPAMI.2024.3447008","url":null,"abstract":"We present CO-Net++, a cohesive framework that optimizes multiple point cloud tasks collectively across heterogeneous dataset domains with a two-stage feature rectification strategy. The core of CO-Net++ lies in optimizing task-shared parameters to capture universal features across various tasks while discerning task-specific parameters tailored to encapsulate the unique characteristics of each task. Specifically, CO-Net++ develops a two-stage feature rectification strategy (TFRS) that distinctly separates the optimization processes for task-shared and task-specific parameters. At the first stage, TFRS configures all parameters in backbone as task-shared, which encourages CO-Net++ to thoroughly assimilate universal attributes pertinent to all tasks. In addition, TFRS introduces a sign-based gradient surgery to facilitate the optimization of task-shared parameters, thus alleviating conflicting gradients induced by various dataset domains. In the second stage, TFRS freezes task-shared parameters and flexibly integrates task-specific parameters into the network for encoding specific characteristics of each dataset domain. CO-Net++ prominently mitigates conflicting optimization caused by parameter entanglement, ensuring the sufficient identification of universal and specific features. Extensive experiments reveal that CO-Net++ realizes exceptional performances on both 3D object detection and 3D semantic segmentation tasks. Moreover, CO-Net++ delivers an impressive incremental learning capability and prevents catastrophic amnesia when generalizing to new point cloud tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10911-10928"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Q-Bench$^+$+: A Benchmark for Multi-Modal Foundation Models on Low-Level Vision From Single Images to Pairs Q-BENCH:从单一图像到成对图像的低级视觉多模式基础模型基准。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-21 DOI: 10.1109/TPAMI.2024.3445770
Zicheng Zhang;Haoning Wu;Erli Zhang;Guangtao Zhai;Weisi Lin
{"title":"Q-Bench$^+$+: A Benchmark for Multi-Modal Foundation Models on Low-Level Vision From Single Images to Pairs","authors":"Zicheng Zhang;Haoning Wu;Erli Zhang;Guangtao Zhai;Weisi Lin","doi":"10.1109/TPAMI.2024.3445770","DOIUrl":"10.1109/TPAMI.2024.3445770","url":null,"abstract":"The rapid development of Multi-modality Large Language Models (MLLMs) has navigated a paradigm shift in computer vision, moving towards versatile foundational models. However, evaluating MLLMs in \u0000<i>low-level visual perception and understanding</i>\u0000 remains a yet-to-explore domain. To this end, we design benchmark settings to \u0000<i>emulate human language responses</i>\u0000 related to low-level vision: the low-level visual \u0000<i>perception</i>\u0000 (\u0000<u>A1</u>\u0000) \u0000<i>via</i>\u0000 visual question answering related to low-level attributes (\u0000<i>e.g. clarity, lighting</i>\u0000); and the low-level visual \u0000<i>description</i>\u0000 (\u0000<u>A2</u>\u0000), on evaluating MLLMs for low-level text descriptions. Furthermore, given that pairwise comparison can better avoid ambiguity of responses and has been adopted by many human experiments, we further extend the low-level perception-related question-answering and description evaluations of MLLMs from single images to \u0000<i>image pairs</i>\u0000. Specifically, for \u0000<i>perception</i>\u0000 (A1), we carry out the LLVisionQA\u0000<inline-formula><tex-math>$^{+}$</tex-math></inline-formula>\u0000 dataset, comprising 2,990 single images and 1,999 image pairs each accompanied by an open-ended question about its low-level features; for \u0000<bold/>\u0000<i>description</i>\u0000<bold/>\u0000 (A2), we propose the LLDescribe\u0000<inline-formula><tex-math>$^{+}$</tex-math></inline-formula>\u0000 dataset, evaluating MLLMs for low-level descriptions on 499 single images and 450 pairs. Additionally, we evaluate MLLMs on \u0000<bold/>\u0000<i>assessment</i>\u0000<bold/>\u0000 (A3) ability, \u0000<i>i.e.</i>\u0000 predicting score, by employing a softmax-based approach to enable all MLLMs to generate \u0000<i>quantifiable</i>\u0000 quality ratings, tested against human opinions in 7 image quality assessment (IQA) datasets. With 24 MLLMs under evaluation, we demonstrate that several MLLMs have decent low-level visual competencies on single images, but only GPT-4V exhibits higher accuracy on pairwise comparisons than single image evaluations (\u0000<i>like humans</i>\u0000). We hope that our benchmark will motivate further research into uncovering and enhancing these nascent capabilities of MLLMs.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10404-10418"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tensorized and Compressed Multi-View Subspace Clustering via Structured Constraint 通过结构化约束进行张量和压缩多视角子空间聚类
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-20 DOI: 10.1109/TPAMI.2024.3446537
Wei Chang;Huimin Chen;Feiping Nie;Rong Wang;Xuelong Li
{"title":"Tensorized and Compressed Multi-View Subspace Clustering via Structured Constraint","authors":"Wei Chang;Huimin Chen;Feiping Nie;Rong Wang;Xuelong Li","doi":"10.1109/TPAMI.2024.3446537","DOIUrl":"10.1109/TPAMI.2024.3446537","url":null,"abstract":"Multi-view learning has raised more and more attention in recent years. However, traditional approaches only focus on the difference while ignoring the consistency among views. It may make some views, with the situation of data abnormality or noise, ineffective in the progress of view learning. Besides, the current datasets have become high-dimensional and large-scale gradually. Therefore, this paper proposes a novel multi-view compressed subspace learning method via low-rank tensor constraint, which incorporates the clustering progress and multi-view learning into a unified framework. First, for each view, we take the partial samples to build a small-size dictionary, which can reduce the effect of both redundancy information and computation cost greatly. Then, to find the consistency and difference among views, we impose a low-rank tensor constraint on these representations and further design an auto-weighted mechanism to learn the optimal representation. Last, due to the non-square of the learned representation, the bipartite graph has been introduced, and under the structured constraint, the clustering results can be obtained directly from this graph without any post-processing. Extensive experiments on synthetic and real-world benchmark datasets demonstrate the efficacy and efficiency of our method, especially for the views with noise or outliers.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10434-10451"},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142010127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective 计算机视觉中的图神经网络和图变换器概览:以任务为导向的视角
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-19 DOI: 10.1109/TPAMI.2024.3445463
Chaoqi Chen;Yushuang Wu;Qiyuan Dai;Hong-Yu Zhou;Mutian Xu;Sibei Yang;Xiaoguang Han;Yizhou Yu
{"title":"A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective","authors":"Chaoqi Chen;Yushuang Wu;Qiyuan Dai;Hong-Yu Zhou;Mutian Xu;Sibei Yang;Xiaoguang Han;Yizhou Yu","doi":"10.1109/TPAMI.2024.3445463","DOIUrl":"10.1109/TPAMI.2024.3445463","url":null,"abstract":"Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), computer vision (e.g., object detection and point cloud learning), and natural language processing (e.g., relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10297-10318"},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approaching the Global Nash Equilibrium of Non-Convex Multi-Player Games 接近非凸多人游戏的全局纳什均衡
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-19 DOI: 10.1109/TPAMI.2024.3445666
Guanpu Chen;Gehui Xu;Fengxiang He;Yiguang Hong;Leszek Rutkowski;Dacheng Tao
{"title":"Approaching the Global Nash Equilibrium of Non-Convex Multi-Player Games","authors":"Guanpu Chen;Gehui Xu;Fengxiang He;Yiguang Hong;Leszek Rutkowski;Dacheng Tao","doi":"10.1109/TPAMI.2024.3445666","DOIUrl":"10.1109/TPAMI.2024.3445666","url":null,"abstract":"Many machine learning problems can be formulated as non-convex multi-player games. Due to non-convexity, it is challenging to obtain the existence condition of the global Nash equilibrium (NE) and design theoretically guaranteed algorithms. This paper studies a class of non-convex multi-player games, where players’ payoff functions consist of canonical functions and quadratic operators. We leverage conjugate properties to transform the complementary problem into a variational inequality (VI) problem using a continuous pseudo-gradient mapping. We prove the existence condition of the global NE as the solution to the VI problem satisfies a duality relation. We then design an ordinary differential equation to approach the global NE with an exponential convergence rate. For practical implementation, we derive a discretized algorithm and apply it to two scenarios: multi-player games with generalized monotonicity and multi-player potential games. In the two settings, step sizes are required to be \u0000<inline-formula><tex-math>$mathcal {O}(1/k)$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$mathcal {O}(1/sqrt{k})$</tex-math></inline-formula>\u0000 to yield the convergence rates of \u0000<inline-formula><tex-math>$mathcal {O}(1/ k)$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$mathcal {O}(1/sqrt{k})$</tex-math></inline-formula>\u0000, respectively. Extensive experiments on robust neural network training and sensor network localization validate our theory. Our code is available at \u0000<uri>https://github.com/GuanpuChen/Global-NE</uri>\u0000.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10797-10813"},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Part Discovery via Dual Representation Alignment 通过双重表征对齐进行无监督部件发现。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-19 DOI: 10.1109/TPAMI.2024.3445582
Jiahao Xia;Wenjian Huang;Min Xu;Jianguo Zhang;Haimin Zhang;Ziyu Sheng;Dong Xu
{"title":"Unsupervised Part Discovery via Dual Representation Alignment","authors":"Jiahao Xia;Wenjian Huang;Min Xu;Jianguo Zhang;Haimin Zhang;Ziyu Sheng;Dong Xu","doi":"10.1109/TPAMI.2024.3445582","DOIUrl":"10.1109/TPAMI.2024.3445582","url":null,"abstract":"Object parts serve as crucial intermediate representations in various downstream tasks, but part-level representation learning still has not received as much attention as other vision tasks. Previous research has established that Vision Transformer can learn instance-level attention without labels, extracting high-quality instance-level representations for boosting downstream tasks. In this paper, we achieve unsupervised part-specific attention learning using a novel paradigm and further employ the part representations to improve part discovery performance. Specifically, paired images are generated from the same image with different geometric transformations, and multiple part representations are extracted from these paired images using a novel module, named PartFormer. These part representations from the paired images are then exchanged to improve geometric transformation invariance. Subsequently, the part representations are aligned with the feature map extracted by a feature map encoder, achieving high similarity with the pixel representations of the corresponding part regions and low similarity in irrelevant regions. Finally, the geometric and semantic constraints are applied to the part representations through the intermediate results in alignment for part-specific attention learning, encouraging the PartFormer to focus locally and the part representations to explicitly include the information of the corresponding parts. Moreover, the aligned part representations can further serve as a series of reliable detectors in the testing phase, predicting pixel masks for part discovery. Extensive experiments are carried out on four widely used datasets, and our results demonstrate that the proposed method achieves competitive performance and robustness due to its part-specific attention.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10597-10613"},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SEA++: Multi-Graph-Based Higher-Order Sensor Alignment for Multivariate Time-Series Unsupervised Domain Adaptation SEA++:基于多图的高阶传感器对齐,用于多变量时间序列无监督领域适应。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-16 DOI: 10.1109/TPAMI.2024.3444904
Yucheng Wang;Yuecong Xu;Jianfei Yang;Min Wu;Xiaoli Li;Lihua Xie;Zhenghua Chen
{"title":"SEA++: Multi-Graph-Based Higher-Order Sensor Alignment for Multivariate Time-Series Unsupervised Domain Adaptation","authors":"Yucheng Wang;Yuecong Xu;Jianfei Yang;Min Wu;Xiaoli Li;Lihua Xie;Zhenghua Chen","doi":"10.1109/TPAMI.2024.3444904","DOIUrl":"10.1109/TPAMI.2024.3444904","url":null,"abstract":"Unsupervised Domain Adaptation (UDA) methods have been successful in reducing label dependency by minimizing the domain discrepancy between labeled source domains and unlabeled target domains. However, these methods face challenges when dealing with Multivariate Time-Series (MTS) data. MTS data typically originates from multiple sensors, each with its unique distribution. This property poses difficulties in adapting existing UDA techniques, which mainly focus on aligning global features while overlooking the distribution discrepancies at the sensor level, thus limiting their effectiveness for MTS data. To address this issue, a practical domain adaptation scenario is formulated as Multivariate Time-Series Unsupervised Domain Adaptation (MTS-UDA). In this paper, we propose SEnsor Alignment (SEA) for MTS-UDA, aiming to address domain discrepancy at both local and global sensor levels. At the local sensor level, we design endo-feature alignment, which aligns sensor features and their correlations across domains. To reduce domain discrepancy at the global sensor level, we design exo-feature alignment that enforces restrictions on global sensor features. We further extend SEA to SEA++ by enhancing the endo-feature alignment. Particularly, we incorporate multi-graph-based higher-order alignment for both sensor features and their correlations. Extensive empirical results have demonstrated the state-of-the-art performance of our SEA and SEA++ on six public MTS datasets for MTS-UDA.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10781-10796"},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
T-Net++: Effective Permutation-Equivariance Network for Two-View Correspondence Pruning T-Net++:用于双视图对应性剪枝的有效换向-方差网络
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-16 DOI: 10.1109/TPAMI.2024.3444457
Guobao Xiao;Xin Liu;Zhen Zhong;Xiaoqin Zhang;Jiayi Ma;Haibin Ling
{"title":"T-Net++: Effective Permutation-Equivariance Network for Two-View Correspondence Pruning","authors":"Guobao Xiao;Xin Liu;Zhen Zhong;Xiaoqin Zhang;Jiayi Ma;Haibin Ling","doi":"10.1109/TPAMI.2024.3444457","DOIUrl":"10.1109/TPAMI.2024.3444457","url":null,"abstract":"We propose a conceptually novel, flexible, and effective framework (named T-Net++) for the task of two-view correspondence pruning. T-Net++ comprises two unique structures: the \u0000<inline-formula><tex-math>$hbox{``}-$</tex-math></inline-formula>\u0000'' structure and the \u0000<inline-formula><tex-math>$hbox{``}|$</tex-math></inline-formula>\u0000'' structure. The \u0000<inline-formula><tex-math>$hbox{``}-$</tex-math></inline-formula>\u0000'' structure utilizes an iterative learning strategy to process correspondences, while the \u0000<inline-formula><tex-math>$hbox{``}|$</tex-math></inline-formula>\u0000'' structure integrates all feature information of the \u0000<inline-formula><tex-math>$hbox{``}-$</tex-math></inline-formula>\u0000'' structure and produces inlier weights. Moreover, within the \u0000<inline-formula><tex-math>$hbox{``}|$</tex-math></inline-formula>\u0000'' structure, we design a new Local-Global Attention Fusion module to fully exploit valuable information obtained from concatenating features through channel-wise and spatial-wise relationships. Furthermore, we develop a Channel-Spatial Squeeze-and-Excitation module, a modified network backbone that enhances the representation ability of important channels and correspondences through the squeeze-and-excitation operation. T-Net++ not only preserves the permutation-equivariance manner for correspondence pruning, but also gathers rich contextual information, thereby enhancing the effectiveness of the network. Experimental results demonstrate that T-Net++ outperforms other state-of-the-art correspondence pruning methods on various benchmarks and excels in two extended tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10629-10644"},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation Metric3D v2:用于零镜头度量深度和表面法线估算的多功能单目几何基础模型。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-16 DOI: 10.1109/TPAMI.2024.3444912
Mu Hu;Wei Yin;Chi Zhang;Zhipeng Cai;Xiaoxiao Long;Hao Chen;Kaixuan Wang;Gang Yu;Chunhua Shen;Shaojie Shen
{"title":"Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation","authors":"Mu Hu;Wei Yin;Chi Zhang;Zhipeng Cai;Xiaoxiao Long;Hao Chen;Kaixuan Wang;Gang Yu;Chunhua Shen;Shaojie Shen","doi":"10.1109/TPAMI.2024.3444912","DOIUrl":"10.1109/TPAMI.2024.3444912","url":null,"abstract":"We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric depth and surface normal estimation from single images, critical for accurate 3D recovery. Depth and normal estimation, though complementary, present distinct challenges. State-of-the-art monocular depth methods achieve zero-shot generalization through affine-invariant depths, but fail to recover real-world metric scale. Conversely, current normal estimation techniques struggle with zero-shot performance due to insufficient labeled data. We propose targeted solutions for both metric depth and normal estimation. For metric depth, we present a canonical camera space transformation module that resolves metric ambiguity across various camera models and large-scale datasets, which can be easily integrated into existing monocular models. For surface normal estimation, we introduce a joint depth-normal optimization module that leverages diverse data from metric depth, allowing normal estimators to improve beyond traditional labels. Our model, trained on over 16 million images from thousands of camera models with varied annotations, excels in zero-shot generalization to new camera settings. As shown in Fig. 1, It ranks the 1st in multiple zero-shot and standard benchmarks for metric depth and surface normal prediction. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. Our model also relieves the scale drift issues of monocular-SLAM (Fig. 3), leading to high-quality metric scale dense mapping. Such applications highlight the versatility of Metric3D v2 models as geometric foundation models.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10579-10596"},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信