{"title":"Multimodal Target Localization With Landmark-Aware Positioning for Urban Mobility","authors":"Naoki Hosomi;Yui Iioka;Shumpei Hatanaka;Teruhisa Misu;Kentaro Yamada;Nanami Tsukamoto;Shunsuke Kobayashi;Komei Sugiura","doi":"10.1109/LRA.2024.3511404","DOIUrl":null,"url":null,"abstract":"Advancements in vehicle automation technology are expected to significantly impact how humans interact with vehicles. In this study, we propose a method to create user-friendly control interfaces for autonomous vehicles in urban environments. The proposed model predicts the vehicle's destination on the images captured by the vehicle's cameras based on high-level navigation instructions. Our data analysis found that users often specify the destination based on the relative positions of landmarks in a scene. The task is challenging because users can specify arbitrary destinations on roads, which do not have distinct visual characteristics for prediction. Thus, the model should consider relationships between landmarks and the ideal stopping position. Existing approaches only model the relationships between instructions and destinations and do not explicitly model the relative positional relationships between landmarks and destinations. To address this limitation, the proposed Target Regressor in Positioning (TRiP) model includes a novel loss function, Landmark-aware Absolute-Relative Target Position Loss, and two novel modules, Target Position Localizer and Multi-Resolution Referring Expression Comprehension Feature Extractor. To validate TRiP, we built a new dataset by extending an existing dataset of referring expression comprehension. The model was evaluated on the dataset using a standard metric, and the results showed that TRiP significantly outperformed the baseline method.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 1","pages":"716-723"},"PeriodicalIF":4.6000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10777394/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Advancements in vehicle automation technology are expected to significantly impact how humans interact with vehicles. In this study, we propose a method to create user-friendly control interfaces for autonomous vehicles in urban environments. The proposed model predicts the vehicle's destination on the images captured by the vehicle's cameras based on high-level navigation instructions. Our data analysis found that users often specify the destination based on the relative positions of landmarks in a scene. The task is challenging because users can specify arbitrary destinations on roads, which do not have distinct visual characteristics for prediction. Thus, the model should consider relationships between landmarks and the ideal stopping position. Existing approaches only model the relationships between instructions and destinations and do not explicitly model the relative positional relationships between landmarks and destinations. To address this limitation, the proposed Target Regressor in Positioning (TRiP) model includes a novel loss function, Landmark-aware Absolute-Relative Target Position Loss, and two novel modules, Target Position Localizer and Multi-Resolution Referring Expression Comprehension Feature Extractor. To validate TRiP, we built a new dataset by extending an existing dataset of referring expression comprehension. The model was evaluated on the dataset using a standard metric, and the results showed that TRiP significantly outperformed the baseline method.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.