Markus Grotz, P. Kaiser, E. Aksoy, Fabian Paus, T. Asfour
{"title":"Graph-based visual semantic perception for humanoid robots","authors":"Markus Grotz, P. Kaiser, E. Aksoy, Fabian Paus, T. Asfour","doi":"10.1109/HUMANOIDS.2017.8246974","DOIUrl":null,"url":null,"abstract":"Semantic understanding of unstructured environments plays an essential role in the autonomous planning and execution of whole-body humanoid locomotion and manipulation tasks. We introduce a new graph-based and data-driven method for semantic representation of unknown environments based on visual sensor data streams. The proposed method extends our previous work, in which loco-manipulation scene affordances are detected in a fully unsupervised manner. We build a geometric primitive-based model of the perceived scene and assign interaction possibilities, i.e. affordances, to the individual primitives. The major contribution of this paper is the enrichment of the extracted scene representation with semantic object information through spatio-temporal fusion of primitives during the perception. To this end, we combine the primitive-based scene representation with object detection methods to identify higher semantic structures in the scene. The qualitative and quantitative evaluation of the proposed method in various experiments in simulation and on the humanoid robot ARMAR-III demonstrates the effectiveness of the approach.","PeriodicalId":143992,"journal":{"name":"2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HUMANOIDS.2017.8246974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Semantic understanding of unstructured environments plays an essential role in the autonomous planning and execution of whole-body humanoid locomotion and manipulation tasks. We introduce a new graph-based and data-driven method for semantic representation of unknown environments based on visual sensor data streams. The proposed method extends our previous work, in which loco-manipulation scene affordances are detected in a fully unsupervised manner. We build a geometric primitive-based model of the perceived scene and assign interaction possibilities, i.e. affordances, to the individual primitives. The major contribution of this paper is the enrichment of the extracted scene representation with semantic object information through spatio-temporal fusion of primitives during the perception. To this end, we combine the primitive-based scene representation with object detection methods to identify higher semantic structures in the scene. The qualitative and quantitative evaluation of the proposed method in various experiments in simulation and on the humanoid robot ARMAR-III demonstrates the effectiveness of the approach.