Keisuke Hamamoto, Huimin Lu, Yujie Li, Tohru Kamiya, Y. Nakatoh, S. Serikawa
{"title":"基于堆叠沙漏网络结构的深度图像抓取位置估计","authors":"Keisuke Hamamoto, Huimin Lu, Yujie Li, Tohru Kamiya, Y. Nakatoh, S. Serikawa","doi":"10.1109/COMPSAC54236.2022.00187","DOIUrl":null,"url":null,"abstract":"In recent years, robots have been used not only in factories. However, most robots currently used in such places can only perform the actions programmed to perform in a predefined space. For robots to become widespread in the future, not only in factories, distribution warehouses, and other places but also in homes and other environments where robots receive complex commands and their surroundings are constantly being updated, it is necessary to make robots intelligent. Therefore, this study proposed a deep learning grasp position estimation model using depth images to achieve intelligence in pick-and-place. This study used only depth images as the training data to build the deep learning model. Some previous studies have used RGB images and depth images. However, in this study, we used only depth images as training data because we expect the inference to be based on the object's shape, independent of the color information of the object. By performing inference based on the target object's shape, the deep learning model is expected to minimize the need for re-training when the target object package changes in the production line since it is not dependent on the RGB image. In this study, we propose a deep learning model that focuses on the stacked encoder-decoder structure of the Stacked Hourglass Network. We compared the proposed method with the baseline method in the same evaluation metrics and a real robot, which shows higher accuracy than other methods in previous studies.","PeriodicalId":330838,"journal":{"name":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Grasp Position Estimation from Depth Image Using Stacked Hourglass Network Structure\",\"authors\":\"Keisuke Hamamoto, Huimin Lu, Yujie Li, Tohru Kamiya, Y. Nakatoh, S. Serikawa\",\"doi\":\"10.1109/COMPSAC54236.2022.00187\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, robots have been used not only in factories. However, most robots currently used in such places can only perform the actions programmed to perform in a predefined space. For robots to become widespread in the future, not only in factories, distribution warehouses, and other places but also in homes and other environments where robots receive complex commands and their surroundings are constantly being updated, it is necessary to make robots intelligent. Therefore, this study proposed a deep learning grasp position estimation model using depth images to achieve intelligence in pick-and-place. This study used only depth images as the training data to build the deep learning model. Some previous studies have used RGB images and depth images. However, in this study, we used only depth images as training data because we expect the inference to be based on the object's shape, independent of the color information of the object. By performing inference based on the target object's shape, the deep learning model is expected to minimize the need for re-training when the target object package changes in the production line since it is not dependent on the RGB image. In this study, we propose a deep learning model that focuses on the stacked encoder-decoder structure of the Stacked Hourglass Network. We compared the proposed method with the baseline method in the same evaluation metrics and a real robot, which shows higher accuracy than other methods in previous studies.\",\"PeriodicalId\":330838,\"journal\":{\"name\":\"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"volume\":\"120 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSAC54236.2022.00187\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC54236.2022.00187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Grasp Position Estimation from Depth Image Using Stacked Hourglass Network Structure
In recent years, robots have been used not only in factories. However, most robots currently used in such places can only perform the actions programmed to perform in a predefined space. For robots to become widespread in the future, not only in factories, distribution warehouses, and other places but also in homes and other environments where robots receive complex commands and their surroundings are constantly being updated, it is necessary to make robots intelligent. Therefore, this study proposed a deep learning grasp position estimation model using depth images to achieve intelligence in pick-and-place. This study used only depth images as the training data to build the deep learning model. Some previous studies have used RGB images and depth images. However, in this study, we used only depth images as training data because we expect the inference to be based on the object's shape, independent of the color information of the object. By performing inference based on the target object's shape, the deep learning model is expected to minimize the need for re-training when the target object package changes in the production line since it is not dependent on the RGB image. In this study, we propose a deep learning model that focuses on the stacked encoder-decoder structure of the Stacked Hourglass Network. We compared the proposed method with the baseline method in the same evaluation metrics and a real robot, which shows higher accuracy than other methods in previous studies.