{"title":"Translating Images to Road Network: A Sequence-to-Sequence Perspective.","authors":"Jiachen Lu,Ming Nie,Bozhou Zhang,Renyuan Peng,Xinyue Cai,Hang Xu,Feng Wen,Wei Zhang,Li Zhang","doi":"10.1109/tpami.2025.3612940","DOIUrl":null,"url":null,"abstract":"The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non- Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of autoregressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success in both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on the nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing stateof- the-art alternatives. Our code is publicly available at opensource https://github.com/fudan-zvg/RoadNetworkTRansformer.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"22 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3612940","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non- Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of autoregressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success in both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on the nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing stateof- the-art alternatives. Our code is publicly available at opensource https://github.com/fudan-zvg/RoadNetworkTRansformer.
期刊介绍:
The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.