C. Sevastopoulos, Mohammad Zaki Zadeh, Michail Theofanidis, Sneh Acharya, Nishit Patel, F. Makedon
{"title":"Towards Safe Visual Navigation of a Wheelchair Using Landmark Detection","authors":"C. Sevastopoulos, Mohammad Zaki Zadeh, Michail Theofanidis, Sneh Acharya, Nishit Patel, F. Makedon","doi":"10.3390/technologies11030064","DOIUrl":null,"url":null,"abstract":"This article presents a method for extracting high-level semantic information through successful landmark detection using 2D RGB images. In particular, the focus is placed on the presence of particular labels (open path, humans, staircase, doorways, obstacles) in the encountered scene, which can be a fundamental source of information enhancing scene understanding and paving the path towards the safe navigation of the mobile unit. Experiments are conducted using a manual wheelchair to gather image instances from four indoor academic environments consisting of multiple labels. Afterwards, the fine-tuning of a pretrained vision transformer (ViT) is conducted, and the performance is evaluated through an ablation study versus well-established state-of-the-art deep architectures for image classification such as ResNet. Results show that the fine-tuned ViT outperforms all other deep convolutional architectures while achieving satisfactory levels of generalization.","PeriodicalId":22341,"journal":{"name":"Technologies","volume":"42 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/technologies11030064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This article presents a method for extracting high-level semantic information through successful landmark detection using 2D RGB images. In particular, the focus is placed on the presence of particular labels (open path, humans, staircase, doorways, obstacles) in the encountered scene, which can be a fundamental source of information enhancing scene understanding and paving the path towards the safe navigation of the mobile unit. Experiments are conducted using a manual wheelchair to gather image instances from four indoor academic environments consisting of multiple labels. Afterwards, the fine-tuning of a pretrained vision transformer (ViT) is conducted, and the performance is evaluated through an ablation study versus well-established state-of-the-art deep architectures for image classification such as ResNet. Results show that the fine-tuned ViT outperforms all other deep convolutional architectures while achieving satisfactory levels of generalization.