Translating Images to Road Network: A Sequence-to-Sequence Perspective.

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-23 DOI:10.1109/tpami.2025.3612940

Jiachen Lu,Ming Nie,Bozhou Zhang,Renyuan Peng,Xinyue Cai,Hang Xu,Feng Wen,Wei Zhang,Li Zhang

{"title":"Translating Images to Road Network: A Sequence-to-Sequence Perspective.","authors":"Jiachen Lu,Ming Nie,Bozhou Zhang,Renyuan Peng,Xinyue Cai,Hang Xu,Feng Wen,Wei Zhang,Li Zhang","doi":"10.1109/tpami.2025.3612940","DOIUrl":null,"url":null,"abstract":"The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non- Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of autoregressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success in both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on the nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing stateof- the-art alternatives. Our code is publicly available at opensource https://github.com/fudan-zvg/RoadNetworkTRansformer.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"22 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3612940","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non- Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of autoregressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success in both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on the nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing stateof- the-art alternatives. Our code is publicly available at opensource https://github.com/fudan-zvg/RoadNetworkTRansformer.

查看原文本刊更多论文

将图像转换为道路网络：一个序列到序列的视角。

道路网的提取对于生成高清晰地图至关重要，因为它可以精确定位道路地标及其相互联系。然而，由于欧几里得（例如，道路地标位置）和非欧几里得（例如，道路拓扑连通性）结构相互冲突的潜在组合，产生道路网络带来了重大挑战。现有的方法很难有效地合并这两种类型的数据域，但很少有方法能正确地解决这个问题。相反，我们的工作通过将欧几里得和非欧几里得数据投影到一个称为RoadNet序列的整数序列中，建立了两种类型数据域的统一表示。除了建模一个自回归序列到序列的Transformer模型来理解RoadNet Sequence之外，我们还将RoadNet Sequence的依赖性解耦为自回归和非自回归依赖性的混合物。在此基础上，我们提出的非自回归序列到序列方法利用了非自回归依赖关系，同时修复了与自回归依赖关系的差距，从而在效率和准确性方面都取得了成功。我们进一步确定了当前RoadNetTransformer在数据集非过拟合分割上的两个主要瓶颈：BEV编码器限制了较差的地标检测和错误传播到拓扑推理。因此，我们提出拓扑继承训练，将更好的拓扑知识继承到RoadNetTransformer中。此外，我们从开源地图数据集中收集SD-Maps，并利用这些先验信息显著提高地标检测和可达性。在nuScenes数据集上进行的大量实验表明，与现有的最先进的替代方法相比，RoadNet序列表示和非自回归方法具有优越性。我们的代码可以在opensource https://github.com/fudan-zvg/RoadNetworkTRansformer上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.