IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献_第5页

End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation 端到端自动驾驶，无需昂贵的模块化和3D手动标注

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-23 DOI: 10.1109/tpami.2025.3610517

Mingzhe Guo, Zhipeng Zhang, Yuan He, Ke Wang, Liping Jing, Haibin Ling

引用次数: 0

Semantic Concentration for Self-Supervised Dense Representations Learning 自监督密集表示学习的语义集中

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-23 DOI: 10.1109/tpami.2025.3609758

Peisong Wen, Qianqian Xu, Siran Dai, Runmin Cong, Qingming Huang

引用次数: 0

Data-And Knowledge-Driven Visual Abductive Reasoning 数据和知识驱动的视觉溯因推理

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-23 DOI: 10.1109/tpami.2025.3613712

Chen Liang, Wenguan Wang, Ling Chen, Yi Yang

引用次数: 0

Translating Images to Road Network: A Sequence-to-Sequence Perspective. 将图像转换为道路网络：一个序列到序列的视角。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-23 DOI: 10.1109/tpami.2025.3612940

Jiachen Lu,Ming Nie,Bozhou Zhang,Renyuan Peng,Xinyue Cai,Hang Xu,Feng Wen,Wei Zhang,Li Zhang

{"title":"Translating Images to Road Network: A Sequence-to-Sequence Perspective.","authors":"Jiachen Lu,Ming Nie,Bozhou Zhang,Renyuan Peng,Xinyue Cai,Hang Xu,Feng Wen,Wei Zhang,Li Zhang","doi":"10.1109/tpami.2025.3612940","DOIUrl":"https://doi.org/10.1109/tpami.2025.3612940","url":null,"abstract":"The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non- Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of autoregressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success in both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on the nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing stateof- the-art alternatives. Our code is publicly available at opensource https://github.com/fudan-zvg/RoadNetworkTRansformer.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"22 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145127196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IPF-RDA: An Information-Preserving Framework for Robust Data Augmentation IPF-RDA：稳健数据扩充的信息保存框架

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-22 DOI: 10.1109/tpami.2025.3613005

Suorong Yang, Hongchao Yang, Suhan Guo, Furao Shen, Jian Zhao

引用次数: 0

Lagrangian Motion Fields for Long-term Motion Generation 长期运动生成的拉格朗日运动场

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-22 DOI: 10.1109/tpami.2025.3612380

Yifei Yang, Zikai Huang, Chenshu Xu, Shengfeng He

引用次数: 0

MGAF: LiDAR-Camera 3D Object Detection with Multiple Guidance and Adaptive Fusion. MGAF：激光雷达-相机三维目标检测与多制导和自适应融合。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-22 DOI: 10.1109/tpami.2025.3612958

Baojie Fan,Xiaotian Li,Yuhan Zhou,Caixia Xia,Huijie Fan,Fengyu Xu,Jiandong Tian

{"title":"MGAF: LiDAR-Camera 3D Object Detection with Multiple Guidance and Adaptive Fusion.","authors":"Baojie Fan,Xiaotian Li,Yuhan Zhou,Caixia Xia,Huijie Fan,Fengyu Xu,Jiandong Tian","doi":"10.1109/tpami.2025.3612958","DOIUrl":"https://doi.org/10.1109/tpami.2025.3612958","url":null,"abstract":"Recent years have witnessed the remarkable progress of 3D multi-modality object detection methods based on the Bird's-Eye-View (BEV) perspective. However, most of them overlook the complementary interaction and guidance between LiDAR and camera. In this work, we propose a novel multi-modality 3D objection detection method, with multi-guided global interaction and LiDAR-guided adaptive fusion, named MGAF. Specifically, we introduce sparse depth guidance (SDG) and LiDAR occupancy guidance (LOG) to generate 3D features with sufficient depth and spatial information. The designed semantic segmentation network captures category and orientation prior information for raw point clouds. In the following, an Adaptive Fusion Dual Transformer (AFDT) is developed to adaptively enhance the interaction of different modal BEV features from both global and bidirectional perspectives. Meanwhile, additional downsampling with sparse height compression and multi-scale dual-path transformer (MSDPT) are designed in order to enlarge the receptive fields of different modal features. Finally, a temporal fusion module is introduced to aggregate features from previous frames. Notably, the proposed AFDT is general, which also shows superior performance on other models. Our framework has undergone extensive experimentation on the large-scale nuScenes dataset, Waymo Open Dataset, and long-range Argoverse2 dataset, consistently demonstrating state-of-the-art performance. The code will be released at:https://github.com/xioatian1/MGAF. 3D object detection, multi-modality, multiple guidance, adaptive fusion, BEV representation, autonomous driving.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"51 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145117074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Optimal Mixture of Experts System for 3D Object Detection: A Game of Accuracy, Efficiency and Adaptivity. 面向三维目标检测的最优混合专家系统：精度、效率和适应性的博弈。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-22 DOI: 10.1109/tpami.2025.3611795

Linshen Liu,Pu Wang,Guanlin Wu,Junyue Jiang,Hao Yang

{"title":"Towards Optimal Mixture of Experts System for 3D Object Detection: A Game of Accuracy, Efficiency and Adaptivity.","authors":"Linshen Liu,Pu Wang,Guanlin Wu,Junyue Jiang,Hao Yang","doi":"10.1109/tpami.2025.3611795","DOIUrl":"https://doi.org/10.1109/tpami.2025.3611795","url":null,"abstract":"Autonomous vehicles, open-world robots, and other automated systems rely on accurate, efficient perception modules for real-time object detection. Although high-precision models improve reliability, their processing time and computational overhead can hinder real-time performance and raise safety concerns. This paper introduces an Edge-based Mixture-of-Experts Optimal Sensing (EMOS) System that addresses the challenge of co-achieving accuracy, latency and scene adaptivity, further demonstrated in the open-world autonomous driving scenarios. Algorithmically, EMOS fuses multimodal sensor streams via an Adaptive Multimodal Data Bridge and uses a scenario-aware MoE switch to activate only a complementary set of specialized experts as needed. The proposed hierarchical backpropagation and a multiscale pooling layer let model capacity scale with real-world demand complexity. System-wise, an edge-optimized runtime with accelerator-aware scheduling (e.g., ONNX/TensorRT), zero-copy buffering, and overlapped I/O-compute enforces explicit latency/accuracy budgets across diverse driving conditions. Experimental results establish EMOS as the new state of the art: on KITTI, it increases average AP by 3.17% while running $2.6times$ faster on Nvidia Jetson. On nuScenes, it improves accuracy by 0.2% mAP and 0.5% NDS, with 34% fewer parameters and a $15.35times$ Nvidia Jetson speedup. Leveraging multimodal data and intelligent experts cooperation, EMOS delivers accurate, efficient and edge-adaptive perception system for autonomous vehicles, thereby ensuring robust, timely responses in real-world scenarios.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"87 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145117071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ADA-Track++: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association ada - track++：端到端多相机3D多目标跟踪交替检测和关联

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-22 DOI: 10.1109/tpami.2025.3613269

Shuxiao Ding, Lukas Schneider, Marius Cordts, Juergen Gall

引用次数: 0

Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models 基于预训练基础模型的低阶专家零弹稀疏混合构建

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-22 DOI: 10.1109/tpami.2025.3612480

Anke Tang, Li Shen, Yong Luo, Shuai Xie, Han Hu, Lefei Zhang, Bo Du, Dacheng Tao

引用次数: 0