{"title":"自然语言表示作为位置识别的特征","authors":"A. Lee, H. Myung","doi":"10.1109/ur55393.2022.9826253","DOIUrl":null,"url":null,"abstract":"Visual information is rich in content, and robots require computer vision techniques to encode images into information to utilize the images. Robot vision transforms the image into descriptors using predefined patterns, whether defined by handcrafted or learned methods. However, the image descriptors are not explainable to human intelligence and limit human-robot interaction upon vision tasks. On the other hand, recent studies have discovered an efficient and expandable method of transforming an image into natural language forms. With visual transformers, the context in an image is translated into natural language representations. To create an image representation both understandable to humans and artificial intelligence, in this paper, we present a method of using the language-image model as natural representations for robotic place recognition tasks.","PeriodicalId":398742,"journal":{"name":"2022 19th International Conference on Ubiquitous Robots (UR)","volume":"303 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Natural Language Representation as Features for Place Recognition\",\"authors\":\"A. Lee, H. Myung\",\"doi\":\"10.1109/ur55393.2022.9826253\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual information is rich in content, and robots require computer vision techniques to encode images into information to utilize the images. Robot vision transforms the image into descriptors using predefined patterns, whether defined by handcrafted or learned methods. However, the image descriptors are not explainable to human intelligence and limit human-robot interaction upon vision tasks. On the other hand, recent studies have discovered an efficient and expandable method of transforming an image into natural language forms. With visual transformers, the context in an image is translated into natural language representations. To create an image representation both understandable to humans and artificial intelligence, in this paper, we present a method of using the language-image model as natural representations for robotic place recognition tasks.\",\"PeriodicalId\":398742,\"journal\":{\"name\":\"2022 19th International Conference on Ubiquitous Robots (UR)\",\"volume\":\"303 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 19th International Conference on Ubiquitous Robots (UR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ur55393.2022.9826253\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th International Conference on Ubiquitous Robots (UR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ur55393.2022.9826253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Natural Language Representation as Features for Place Recognition
Visual information is rich in content, and robots require computer vision techniques to encode images into information to utilize the images. Robot vision transforms the image into descriptors using predefined patterns, whether defined by handcrafted or learned methods. However, the image descriptors are not explainable to human intelligence and limit human-robot interaction upon vision tasks. On the other hand, recent studies have discovered an efficient and expandable method of transforming an image into natural language forms. With visual transformers, the context in an image is translated into natural language representations. To create an image representation both understandable to humans and artificial intelligence, in this paper, we present a method of using the language-image model as natural representations for robotic place recognition tasks.