将图像转换为道路网络:一个序列到序列的视角。

IF 18.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jiachen Lu,Ming Nie,Bozhou Zhang,Renyuan Peng,Xinyue Cai,Hang Xu,Feng Wen,Wei Zhang,Li Zhang
{"title":"将图像转换为道路网络:一个序列到序列的视角。","authors":"Jiachen Lu,Ming Nie,Bozhou Zhang,Renyuan Peng,Xinyue Cai,Hang Xu,Feng Wen,Wei Zhang,Li Zhang","doi":"10.1109/tpami.2025.3612940","DOIUrl":null,"url":null,"abstract":"The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non- Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of autoregressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success in both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on the nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing stateof- the-art alternatives. Our code is publicly available at opensource https://github.com/fudan-zvg/RoadNetworkTRansformer.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"22 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Translating Images to Road Network: A Sequence-to-Sequence Perspective.\",\"authors\":\"Jiachen Lu,Ming Nie,Bozhou Zhang,Renyuan Peng,Xinyue Cai,Hang Xu,Feng Wen,Wei Zhang,Li Zhang\",\"doi\":\"10.1109/tpami.2025.3612940\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non- Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of autoregressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success in both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on the nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing stateof- the-art alternatives. Our code is publicly available at opensource https://github.com/fudan-zvg/RoadNetworkTRansformer.\",\"PeriodicalId\":13426,\"journal\":{\"name\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tpami.2025.3612940\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3612940","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

道路网的提取对于生成高清晰地图至关重要,因为它可以精确定位道路地标及其相互联系。然而,由于欧几里得(例如,道路地标位置)和非欧几里得(例如,道路拓扑连通性)结构相互冲突的潜在组合,产生道路网络带来了重大挑战。现有的方法很难有效地合并这两种类型的数据域,但很少有方法能正确地解决这个问题。相反,我们的工作通过将欧几里得和非欧几里得数据投影到一个称为RoadNet序列的整数序列中,建立了两种类型数据域的统一表示。除了建模一个自回归序列到序列的Transformer模型来理解RoadNet Sequence之外,我们还将RoadNet Sequence的依赖性解耦为自回归和非自回归依赖性的混合物。在此基础上,我们提出的非自回归序列到序列方法利用了非自回归依赖关系,同时修复了与自回归依赖关系的差距,从而在效率和准确性方面都取得了成功。我们进一步确定了当前RoadNetTransformer在数据集非过拟合分割上的两个主要瓶颈:BEV编码器限制了较差的地标检测和错误传播到拓扑推理。因此,我们提出拓扑继承训练,将更好的拓扑知识继承到RoadNetTransformer中。此外,我们从开源地图数据集中收集SD-Maps,并利用这些先验信息显著提高地标检测和可达性。在nuScenes数据集上进行的大量实验表明,与现有的最先进的替代方法相比,RoadNet序列表示和非自回归方法具有优越性。我们的代码可以在opensource https://github.com/fudan-zvg/RoadNetworkTRansformer上公开获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Translating Images to Road Network: A Sequence-to-Sequence Perspective.
The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non- Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of autoregressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success in both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on the nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing stateof- the-art alternatives. Our code is publicly available at opensource https://github.com/fudan-zvg/RoadNetworkTRansformer.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
28.40
自引率
3.00%
发文量
885
审稿时长
8.5 months
期刊介绍: The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信