IEEE transactions on pattern analysis and machine intelligence最新文献

筛选
英文 中文
Toward Resolution Mismatching: Modality-Aware Feature-Aligned Network for Pan-Sharpening 分辨率不匹配:面向泛锐化的模态感知特征对齐网络
IF 18.6
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-08-01 DOI: 10.1109/TPAMI.2025.3594898
Man Zhou;Xuanhua He;Danfeng Hong
{"title":"Toward Resolution Mismatching: Modality-Aware Feature-Aligned Network for Pan-Sharpening","authors":"Man Zhou;Xuanhua He;Danfeng Hong","doi":"10.1109/TPAMI.2025.3594898","DOIUrl":"10.1109/TPAMI.2025.3594898","url":null,"abstract":"Panchromatic (PAN) and multi-spectral (MS) remote satellite image fusion, known as pan-sharpening, aims to produce high-resolution MS images by combining the complementary information from the high-resolution, texture-rich PAN and the low-resolution but high spectral-resolution MS counterparts. Despite notable advancements in this field, the current state-of-the-art pan-sharpening techniques do not <italic>explicitly</i> address the spatial resolution mismatching problem between the two modalities of PAN and MS images. This mismatching issue can lead to misalignment in feature representation and the creation of blurry artifacts in the model output, ultimately hindering the generation of high-frequency textures and impeding the performance improvement of such methods. To address the aforementioned spatial resolution mismatching problem in pan-sharpening, we propose a novel modality-aware feature-aligned pan-sharpening framework in this paper. The framework comprises three primary stages: modality-aware feature extraction, modality-aware feature aligning, and context integrated image reconstruction. First, we introduce the half-instance normalization strategy as the backbone to filter out the inconsistent features and promote the learning of consistent features between the PAN and MS modalities. Second, a learnable modality-aware feature interpolation is devised to effectively address the misalignment issue. Specifically, the extracted features from the backbone are integrated to predict the transformation offsets of each pixel, which allows for the adaptive selection of custom contextual information and enables the modality-aware features to be more aligned. Finally, within the context of the interactive offset correction, multi-stage information is aggregated to generate the feasible pan-sharpened model output. Extensive experimental results over multiple satellite datasets demonstrate that the proposed algorithm outperforms other state-of-the-art methods both qualitatively and quantitatively, exhibiting great generalization ability to real-world scenes.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10753-10769"},"PeriodicalIF":18.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144763371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decouple Before Align: Visual Disentanglement Enhances Prompt Tuning 在对齐之前解耦:视觉解耦增强了提示调谐
IF 18.6
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-08-01 DOI: 10.1109/TPAMI.2025.3594894
Fei Zhang;Tianfei Zhou;Jiangchao Yao;Ya Zhang;Ivor W. Tsang;Yanfeng Wang
{"title":"Decouple Before Align: Visual Disentanglement Enhances Prompt Tuning","authors":"Fei Zhang;Tianfei Zhou;Jiangchao Yao;Ya Zhang;Ivor W. Tsang;Yanfeng Wang","doi":"10.1109/TPAMI.2025.3594894","DOIUrl":"10.1109/TPAMI.2025.3594894","url":null,"abstract":"<italic>P</i>rompt tuning (PT), as an emerging resource-efficient fine-tuning paradigm, has showcased remarkable effectiveness in improving the task-specific transferability of <italic>vision-language models</i>. This paper delves into a previously overlooked <italic>information asymmetry</i> issue in PT, where the visual modality mostly conveys more context than the object-oriented textual modality. Correspondingly, coarsely aligning these two modalities could result in the <italic>biased attention</i>, driving the model to merely focus on the context area. To address this, we propose DAPT, an effective PT framework based on an intuitive <italic>decouple-before-align</i> concept. First, we propose to explicitly decouple the visual modality into the foreground and background representation via exploiting coarse-and-fine visual segmenting cues, and then both of these decoupled patterns are aligned with the original foreground texts and the hand-crafted background classes, thereby symmetrically strengthening the modal alignment. To further enhance the visual concentration, we propose a visual pull-push regularization tailored for the foreground-background patterns, directing the original visual representation towards unbiased attention on the <italic>region-of-interest</i> object. We demonstrate the power of architecture-free DAPT through <italic>few-shot learning</i>, <italic>base-to-novel generalization</i>, and <italic>data-efficient learning</i>, all of which yield superior performance across prevailing benchmarks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10619-10632"},"PeriodicalIF":18.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144763369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Density Peaks Clustering for Manifold Data With Multiple Peaks 多峰流形数据的鲁棒密度峰聚类
IF 18.6
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-31 DOI: 10.1109/TPAMI.2025.3594121
Ling Ding;Chao Li;Shifei Ding;Xiao Xu;Lili Guo;Xindong Wu
{"title":"Robust Density Peaks Clustering for Manifold Data With Multiple Peaks","authors":"Ling Ding;Chao Li;Shifei Ding;Xiao Xu;Lili Guo;Xindong Wu","doi":"10.1109/TPAMI.2025.3594121","DOIUrl":"10.1109/TPAMI.2025.3594121","url":null,"abstract":"Density peaks clustering (DPC) is an excellent clustering algorithm that does not need any prior knowledge. However, DPC still has the following shortcomings: (1) The Euclidean distance used by it is not applicable to manifold data with multiple peaks. (2) The local density calculation for DPC is too simple, and the final results may fluctuate due to the cutoff-distance <italic>d<sub>c</sub></i>. (3) Manually selected centers by decision-graph may lead to a wrong number of clusters and poor performance. To address these shortcomings and improve the performance, a robust density peaks clustering algorithm for manifold data with multiple peaks (RDPCM) is proposed to reduce the sensitivity of clustering results to parameters. Motivated by DPC-GD, RDPCM replaces the Euclidean distance with geodesic distance, which is optimized by the improved mutual K-nearest neighbors. It better considers the local manifold structure of the datasets and obtains excellent results. In addition, the Davies-Bouldin Index based on Minimum Spanning Tree (MDBI) is proposed to select the ideal number of classes adaptively. Numerous experiments have established that RDPCM is more effective and superior than other advanced clustering algorithms.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10696-10708"},"PeriodicalIF":18.6,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144755941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning-Based Point Cloud Compression: An In-Depth Survey and Benchmark 基于深度学习的点云压缩:深入调查和基准
IF 18.6
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-31 DOI: 10.1109/TPAMI.2025.3594355
Wei Gao;Liang Xie;Songlin Fan;Ge Li;Shan Liu;Wen Gao
{"title":"Deep Learning-Based Point Cloud Compression: An In-Depth Survey and Benchmark","authors":"Wei Gao;Liang Xie;Songlin Fan;Ge Li;Shan Liu;Wen Gao","doi":"10.1109/TPAMI.2025.3594355","DOIUrl":"10.1109/TPAMI.2025.3594355","url":null,"abstract":"With the maturity of 3D capture technology, the explosive growth of point cloud data has burdened the storage and transmission process. Traditional hybrid point cloud compression (PCC) tools relying on handcrafted priors have limited compression performance and are increasingly weak in addressing the burden induced by data growth. Recently, deep learning-based PCC methods have been introduced to continue to push the PCC performance boundary. With the thriving of deep PCC, the community urgently demands a systematic overview to conclude the past progress and present future research directions. In this paper, we have a detailed review that covers popular point cloud datasets, algorithm evolution, benchmarking analysis, and future trends. Concretely, we first introduce several widely-used PCC datasets according to their major properties. Then the algorithm evolution of existing studies on deep PCC, including lossy ones and lossless ones proposed for various point cloud types, is reviewed. Apart from academic studies, we also investigate the development of relevant international standards (i.e., MPEG standards and JPEG standards). To help have an in-depth understanding of the advance of deep PCC, we select a representative set of methods and conduct extensive experiments on multiple datasets. Comprehensive benchmarking comparisons and analysis reveal the pros and cons of previous methods. Finally, based on the profound analysis, we highlight the challenges and future trends of deep learning-based PCC, paving the way for further study.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10731-10752"},"PeriodicalIF":18.6,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144755946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SA3Det++: Side-Aware Quality Estimation for Semi-Supervised 3D Object Detection 半监督三维物体检测的侧面感知质量估计
IF 18.6
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-31 DOI: 10.1109/TPAMI.2025.3594086
Wenfei Yang;Chuxin Wang;Tianzhu Zhang;Yongdong Zhang;Feng Wu
{"title":"SA3Det++: Side-Aware Quality Estimation for Semi-Supervised 3D Object Detection","authors":"Wenfei Yang;Chuxin Wang;Tianzhu Zhang;Yongdong Zhang;Feng Wu","doi":"10.1109/TPAMI.2025.3594086","DOIUrl":"10.1109/TPAMI.2025.3594086","url":null,"abstract":"Semi-supervised 3D object detection from point cloud aims to train a detector with a small number of labeled data and a large number of unlabeled data. Among existing methods, the pseudo-label based methods have achieved superior performance, and the core lies in how to select high-quality pseudo-labels with the designed quality evaluation criterion. Despite the success of these methods, they all consider the localization and classification quality estimation from a global perspective. For localization quality, they use a global score threshold to filter out low-quality pseudo-labels and assign equal importance to each side during training, ignoring the fact that sides with different localization quality should not be treat equally. Besides, a large number of pseudo-labels are discarded due to the high global threshold, which may also contain some correctly predicted sides that are helpful for model training. For the classification quality, they usually combine the objectness score and classification confidence score to filter out pseudo-labels. The main focus of them is designing effective classification confidence evaluation metrics, neglecting the importance of predicting better objectness score. In this paper, we propose SA3Det++, a side-aware quality estimation method for semi-supervised object detection, which consists of a probabilistic side localization strategy, a side-aware quality estimation strategy, and a soft pseudo-label selection strategy. Extensive results demonstrate that the proposed method consistently outperforms the baseline methods under different scenes and evaluation criterions.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10664-10679"},"PeriodicalIF":18.6,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144755922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human Motion Video Generation: A Survey 人体运动视频生成:综述
IF 18.6
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-31 DOI: 10.1109/TPAMI.2025.3594034
Haiwei Xue;Xiangyang Luo;Zhanghao Hu;Xin Zhang;Xunzhi Xiang;Yuqin Dai;Jianzhuang Liu;Zhensong Zhang;Minglei Li;Jian Yang;Fei Ma;Zhiyong Wu;Changpeng Yang;Zonghong Dai;Fei Richard Yu
{"title":"Human Motion Video Generation: A Survey","authors":"Haiwei Xue;Xiangyang Luo;Zhanghao Hu;Xin Zhang;Xunzhi Xiang;Yuqin Dai;Jianzhuang Liu;Zhensong Zhang;Minglei Li;Jian Yang;Fei Ma;Zhiyong Wu;Changpeng Yang;Zonghong Dai;Fei Richard Yu","doi":"10.1109/TPAMI.2025.3594034","DOIUrl":"10.1109/TPAMI.2025.3594034","url":null,"abstract":"Human motion video generation has garnered significant research interest due to its broad applications, enabling innovations such as photorealistic singing heads or dynamic avatars that seamlessly dance to music. However, existing surveys in this field focus on individual methods, lacking a comprehensive overview of the entire generative process. This paper addresses this gap by providing an in-depth survey of human motion video generation, encompassing over ten sub-tasks, and detailing the five key phases of the generation process: input, motion planning, motion video generation, refinement, and output. Notably, this is the first survey that discusses the potential of large language models in enhancing human motion video generation. Our survey reviews the latest developments and technological trends in human motion video generation across three primary modalities: vision, text, and audio. By covering over two hundred papers, we offer a thorough overview of the field and highlight milestone works that have driven significant technological breakthroughs. Our goal for this survey is to unveil the prospects of human motion video generation and serve as a valuable resource for advancing the comprehensive applications of digital humans.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10709-10730"},"PeriodicalIF":18.6,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144755902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational and Statistical Guarantees for Tensor-on-Tensor Regression With Tensor Train Decomposition 基于张量列分解的张量-张量回归的计算和统计保证
IF 18.6
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-31 DOI: 10.1109/TPAMI.2025.3593840
Zhen Qin;Zhihui Zhu
{"title":"Computational and Statistical Guarantees for Tensor-on-Tensor Regression With Tensor Train Decomposition","authors":"Zhen Qin;Zhihui Zhu","doi":"10.1109/TPAMI.2025.3593840","DOIUrl":"10.1109/TPAMI.2025.3593840","url":null,"abstract":"Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order <inline-formula><tex-math>$N+M$</tex-math></inline-formula>. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD. Notably, compared to the IHT, which optimizes the entire tensor in each iteration while maintaining the TT structure through TT-SVD and poses a challenge for storage memory in practice, the RGD optimizes factors in the so-called left-orthogonal TT format, enforcing orthonormality among most of the factors, over the Stiefel manifold, thereby reducing the storage complexity of the IHT. However, this reduction in storage memory comes at a cost: the recovery of RGD is worse than that of IHT, while the error bounds of both algorithms depend on <inline-formula><tex-math>$N+M$</tex-math></inline-formula> polynomially. Experimental validation substantiates the validity of our theoretical findings.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10577-10587"},"PeriodicalIF":18.6,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144755920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Spiking Attention Network: Sparsity, Efficiency and Robustness 图峰值注意网络:稀疏性、效率和鲁棒性
IF 18.6
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-30 DOI: 10.1109/TPAMI.2025.3593912
Beibei Wang;Bo Jiang;Jin Tang;Lu Bai;Bin Luo
{"title":"Graph Spiking Attention Network: Sparsity, Efficiency and Robustness","authors":"Beibei Wang;Bo Jiang;Jin Tang;Lu Bai;Bin Luo","doi":"10.1109/TPAMI.2025.3593912","DOIUrl":"10.1109/TPAMI.2025.3593912","url":null,"abstract":"Existing Graph Attention Networks (GATs) generally adopt the self-attention mechanism to learn graph edge attention, which usually return dense attention coefficients over all neighbors and thus are prone to be sensitive to graph edge noises. To overcome this problem, sparse GATs are desirable and have garnered increasing interest in recent years. However, existing sparse GATs usually suffer from <italic>high training complexity</i> and are also <italic>not straightforward</i> for inductive learning tasks. To address these issues, we propose to learn <bold>sparse</b> GATs by exploiting spiking neuron (SN) mechanism, termed Graph Spiking Attention (GSAT). Specifically, it is known that spiking neuron can perform inexpensive information processing by transmitting the input data into discrete spike trains and return sparse outputs. Inspired by it, this work attempts to exploit spiking neuron to learn sparse attention coefficients, resulting in edge-sparsified graph for GNNs. Therefore, GSAT can perform message passing on the selective neighbors naturally, which makes GSAT perform compactly and robustly w.r.t graph noises. Moreover, GSAT can be used straightforwardly for inductive learning tasks. Extensive experiments on both transductive and inductive tasks demonstrate the <italic>effectiveness</i>, <italic>robustness</i> and <italic>efficiency</i> of GSAT.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10862-10869"},"PeriodicalIF":18.6,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144747519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval 改进文本反演零镜头合成图像检索
IF 18.6
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-29 DOI: 10.1109/TPAMI.2025.3593539
Lorenzo Agnolucci;Alberto Baldrati;Alberto Del Bimbo;Marco Bertini
{"title":"iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval","authors":"Lorenzo Agnolucci;Alberto Baldrati;Alberto Del Bimbo;Marco Bertini","doi":"10.1109/TPAMI.2025.3593539","DOIUrl":"10.1109/TPAMI.2025.3593539","url":null,"abstract":"Given a query consisting of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption. The reliance of supervised methods on labor-intensive manually labeled datasets hinders their broad applicability to CIR. In this work, we introduce a new task, Zero-Shot CIR (ZS-CIR), that addresses CIR without the need for a labeled training dataset. We propose an approach, named iSEARLE (improved zero-Shot composEd imAge Retrieval with textuaL invErsion), that involves mapping the visual information of the reference image into a pseudo-word token in the CLIP token embedding space and combining it with the relative caption. To foster research on ZS-CIR, we present an open-domain benchmarking dataset named CIRCO (Composed Image Retrieval on Common Objects in context), the first CIR dataset where each query is labeled with multiple ground truths and a semantic categorization. The experimental results illustrate that iSEARLE obtains state-of-the-art performance on three different CIR datasets – FashionIQ, CIRR, and the proposed CIRCO – and two additional evaluation settings, namely domain conversion and object composition.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10801-10817"},"PeriodicalIF":18.6,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144737123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders mtmamba++:通过基于mamba的解码器增强多任务密集场景理解
IF 18.6
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-29 DOI: 10.1109/TPAMI.2025.3593621
Baijiong Lin;Weisen Jiang;Pengguang Chen;Shu Liu;Ying-Cong Chen
{"title":"MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders","authors":"Baijiong Lin;Weisen Jiang;Pengguang Chen;Shu Liu;Ying-Cong Chen","doi":"10.1109/TPAMI.2025.3593621","DOIUrl":"10.1109/TPAMI.2025.3593621","url":null,"abstract":"Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging state-space models, while CTM explicitly models task interactions to facilitate information exchange across tasks. We design two types of CTM block, namely F-CTM and S-CTM, to enhance cross-task interaction from feature and semantic perspectives, respectively. Extensive experiments on NYUDv2, PASCAL-Context, and Cityscapes datasets demonstrate the superior performance of MTMamba++ over CNN-based, Transformer-based, and diffusion-based methods while maintaining high computational efficiency.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10633-10645"},"PeriodicalIF":18.6,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144736829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信