{"title":"End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation","authors":"Mingzhe Guo, Zhipeng Zhang, Yuan He, Ke Wang, Liping Jing, Haibin Ling","doi":"10.1109/tpami.2025.3610517","DOIUrl":"https://doi.org/10.1109/tpami.2025.3610517","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"2 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145127455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Translating Images to Road Network: A Sequence-to-Sequence Perspective.","authors":"Jiachen Lu,Ming Nie,Bozhou Zhang,Renyuan Peng,Xinyue Cai,Hang Xu,Feng Wen,Wei Zhang,Li Zhang","doi":"10.1109/tpami.2025.3612940","DOIUrl":"https://doi.org/10.1109/tpami.2025.3612940","url":null,"abstract":"The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non- Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of autoregressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success in both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on the nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing stateof- the-art alternatives. Our code is publicly available at opensource https://github.com/fudan-zvg/RoadNetworkTRansformer.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"22 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145127196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MGAF: LiDAR-Camera 3D Object Detection with Multiple Guidance and Adaptive Fusion.","authors":"Baojie Fan,Xiaotian Li,Yuhan Zhou,Caixia Xia,Huijie Fan,Fengyu Xu,Jiandong Tian","doi":"10.1109/tpami.2025.3612958","DOIUrl":"https://doi.org/10.1109/tpami.2025.3612958","url":null,"abstract":"Recent years have witnessed the remarkable progress of 3D multi-modality object detection methods based on the Bird's-Eye-View (BEV) perspective. However, most of them overlook the complementary interaction and guidance between LiDAR and camera. In this work, we propose a novel multi-modality 3D objection detection method, with multi-guided global interaction and LiDAR-guided adaptive fusion, named MGAF. Specifically, we introduce sparse depth guidance (SDG) and LiDAR occupancy guidance (LOG) to generate 3D features with sufficient depth and spatial information. The designed semantic segmentation network captures category and orientation prior information for raw point clouds. In the following, an Adaptive Fusion Dual Transformer (AFDT) is developed to adaptively enhance the interaction of different modal BEV features from both global and bidirectional perspectives. Meanwhile, additional downsampling with sparse height compression and multi-scale dual-path transformer (MSDPT) are designed in order to enlarge the receptive fields of different modal features. Finally, a temporal fusion module is introduced to aggregate features from previous frames. Notably, the proposed AFDT is general, which also shows superior performance on other models. Our framework has undergone extensive experimentation on the large-scale nuScenes dataset, Waymo Open Dataset, and long-range Argoverse2 dataset, consistently demonstrating state-of-the-art performance. The code will be released at:https://github.com/xioatian1/MGAF. 3D object detection, multi-modality, multiple guidance, adaptive fusion, BEV representation, autonomous driving.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"51 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145117074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linshen Liu,Pu Wang,Guanlin Wu,Junyue Jiang,Hao Yang
{"title":"Towards Optimal Mixture of Experts System for 3D Object Detection: A Game of Accuracy, Efficiency and Adaptivity.","authors":"Linshen Liu,Pu Wang,Guanlin Wu,Junyue Jiang,Hao Yang","doi":"10.1109/tpami.2025.3611795","DOIUrl":"https://doi.org/10.1109/tpami.2025.3611795","url":null,"abstract":"Autonomous vehicles, open-world robots, and other automated systems rely on accurate, efficient perception modules for real-time object detection. Although high-precision models improve reliability, their processing time and computational overhead can hinder real-time performance and raise safety concerns. This paper introduces an Edge-based Mixture-of-Experts Optimal Sensing (EMOS) System that addresses the challenge of co-achieving accuracy, latency and scene adaptivity, further demonstrated in the open-world autonomous driving scenarios. Algorithmically, EMOS fuses multimodal sensor streams via an Adaptive Multimodal Data Bridge and uses a scenario-aware MoE switch to activate only a complementary set of specialized experts as needed. The proposed hierarchical backpropagation and a multiscale pooling layer let model capacity scale with real-world demand complexity. System-wise, an edge-optimized runtime with accelerator-aware scheduling (e.g., ONNX/TensorRT), zero-copy buffering, and overlapped I/O-compute enforces explicit latency/accuracy budgets across diverse driving conditions. Experimental results establish EMOS as the new state of the art: on KITTI, it increases average AP by 3.17% while running $2.6times$ faster on Nvidia Jetson. On nuScenes, it improves accuracy by 0.2% mAP and 0.5% NDS, with 34% fewer parameters and a $15.35times$ Nvidia Jetson speedup. Leveraging multimodal data and intelligent experts cooperation, EMOS delivers accurate, efficient and edge-adaptive perception system for autonomous vehicles, thereby ensuring robust, timely responses in real-world scenarios.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"87 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145117071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuxiao Ding, Lukas Schneider, Marius Cordts, Juergen Gall
{"title":"ADA-Track++: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association","authors":"Shuxiao Ding, Lukas Schneider, Marius Cordts, Juergen Gall","doi":"10.1109/tpami.2025.3613269","DOIUrl":"https://doi.org/10.1109/tpami.2025.3613269","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"41 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145116227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anke Tang, Li Shen, Yong Luo, Shuai Xie, Han Hu, Lefei Zhang, Bo Du, Dacheng Tao
{"title":"Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models","authors":"Anke Tang, Li Shen, Yong Luo, Shuai Xie, Han Hu, Lefei Zhang, Bo Du, Dacheng Tao","doi":"10.1109/tpami.2025.3612480","DOIUrl":"https://doi.org/10.1109/tpami.2025.3612480","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"18 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145116229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}