Multimodal Deep Learning for Robust Road Attribute Detection

IF 1.2 Q4 REMOTE SENSING

ACM Transactions on Spatial Algorithms and Systems Pub Date : 2023-09-02 DOI:10.1145/3618108

Yifang Yin, Wenmiao Hu, An Tran, Ying Zhang, Guanfeng Wang, H. Kruppa, Roger Zimmermann, See-Kiong Ng

{"title":"Multimodal Deep Learning for Robust Road Attribute Detection","authors":"Yifang Yin, Wenmiao Hu, An Tran, Ying Zhang, Guanfeng Wang, H. Kruppa, Roger Zimmermann, See-Kiong Ng","doi":"10.1145/3618108","DOIUrl":null,"url":null,"abstract":"Automatic inference of missing road attributes (e.g., road type and speed limit) for enriching digital maps has attracted significant research attention in recent years. A number of machine learning based approaches have been proposed to detect road attributes from GPS traces, dash-cam videos, or satellite images. However, existing solutions mostly focus on a single modality without modeling the correlations among multiple data sources. To bridge this gap, we present a multimodal road attribute detection method, which improves the robustness by performing pixel-level fusion of crowdsourced GPS traces and satellite images. A GPS trace is usually given by a sequence of location, bearing, and speed. To align it with satellite imagery in the spatial domain, we render GPS traces into a sequence of multi-channel images that simultaneously capture the global distribution of the GPS points, the local distribution of vehicles’ moving directions and speeds, and their temporal changes over time, at each pixel. Unlike previous GPS based road feature extraction methods, our proposed GPS rendering does not require map matching in the data preprocessing step. Moreover, our multimodal solution addresses single-modal challenges such as occlusions in satellite images and data sparsity in GPS traces by learning the pixel-wise correspondences among different data sources. On top of this, we observe that geographic objects and their attributes in the map are not isolated but correlated with each other. Thus, if a road is partially labeled, the existing information can be of great help on inferring the missing attributes. To fully use the existing information, we extend our model and discuss the possibilities for further performance improvement when partially labeled map data is available. Extensive experiments have been conducted on two real-world datasets in Singapore and Jakarta. Compared with previous work, our method is able to improve the detection accuracy on road attributes by a large margin.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Spatial Algorithms and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3618108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"REMOTE SENSING","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic inference of missing road attributes (e.g., road type and speed limit) for enriching digital maps has attracted significant research attention in recent years. A number of machine learning based approaches have been proposed to detect road attributes from GPS traces, dash-cam videos, or satellite images. However, existing solutions mostly focus on a single modality without modeling the correlations among multiple data sources. To bridge this gap, we present a multimodal road attribute detection method, which improves the robustness by performing pixel-level fusion of crowdsourced GPS traces and satellite images. A GPS trace is usually given by a sequence of location, bearing, and speed. To align it with satellite imagery in the spatial domain, we render GPS traces into a sequence of multi-channel images that simultaneously capture the global distribution of the GPS points, the local distribution of vehicles’ moving directions and speeds, and their temporal changes over time, at each pixel. Unlike previous GPS based road feature extraction methods, our proposed GPS rendering does not require map matching in the data preprocessing step. Moreover, our multimodal solution addresses single-modal challenges such as occlusions in satellite images and data sparsity in GPS traces by learning the pixel-wise correspondences among different data sources. On top of this, we observe that geographic objects and their attributes in the map are not isolated but correlated with each other. Thus, if a road is partially labeled, the existing information can be of great help on inferring the missing attributes. To fully use the existing information, we extend our model and discuss the possibilities for further performance improvement when partially labeled map data is available. Extensive experiments have been conducted on two real-world datasets in Singapore and Jakarta. Compared with previous work, our method is able to improve the detection accuracy on road attributes by a large margin.

查看原文本刊更多论文

鲁棒道路属性检测的多模态深度学习

近年来，对缺失道路属性(如道路类型和限速)进行自动推断以丰富数字地图，引起了广泛的研究关注。已经提出了许多基于机器学习的方法来从GPS轨迹、行车记录仪视频或卫星图像中检测道路属性。然而，现有的解决方案主要关注单一模态，而没有对多个数据源之间的相关性进行建模。为了弥补这一差距，我们提出了一种多模式道路属性检测方法，该方法通过对众包GPS轨迹和卫星图像进行像素级融合来提高鲁棒性。GPS轨迹通常由一系列位置、方位和速度给出。为了将其与空间域的卫星图像对齐，我们将GPS轨迹渲染成一系列多通道图像，同时捕获GPS点的全球分布、车辆移动方向和速度的局部分布以及它们在每个像素上随时间的变化。与以往基于GPS的道路特征提取方法不同，本文提出的GPS绘制方法在数据预处理阶段不需要地图匹配。此外，我们的多模态解决方案通过学习不同数据源之间的逐像素对应关系来解决单模态挑战，例如卫星图像中的遮挡和GPS轨迹中的数据稀疏性。最重要的是，我们观察到地图中的地理对象及其属性不是孤立的，而是相互关联的。因此，如果一条道路被部分标记，现有的信息对推断缺失的属性有很大帮助。为了充分利用现有信息，我们扩展了我们的模型，并讨论了当有部分标记的地图数据可用时进一步提高性能的可能性。在新加坡和雅加达的两个真实数据集上进行了广泛的实验。与以往的工作相比，我们的方法能够大大提高道路属性的检测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Spatial Algorithms and Systems REMOTE SENSING-

CiteScore

4.40

自引率

5.30%

发文量

期刊介绍： ACM Transactions on Spatial Algorithms and Systems (TSAS) is a scholarly journal that publishes the highest quality papers on all aspects of spatial algorithms and systems and closely related disciplines. It has a multi-disciplinary perspective in that it spans a large number of areas where spatial data is manipulated or visualized (regardless of how it is specified - i.e., geometrically or textually) such as geography, geographic information systems (GIS), geospatial and spatiotemporal databases, spatial and metric indexing, location-based services, web-based spatial applications, geographic information retrieval (GIR), spatial reasoning and mining, security and privacy, as well as the related visual computing areas of computer graphics, computer vision, geometric modeling, and visualization where the spatial, geospatial, and spatiotemporal data is central.