{"title":"GAF-RCNN:基于点云的网格注意力融合三维物体检测","authors":"Zheng Li, Guofeng Tong, Hao Peng, Mingwei Ma","doi":"10.12688/cobot.17590.1","DOIUrl":null,"url":null,"abstract":"Background: Due to the refinement of region of the interests (RoIs), two-stage 3D detection algorithms can usually obtain better performance compared with most single-stage detectors. However, most two-stage methods adopt feature connection, to aggregate the grid point features using multi-scale RoI pooling in the second stage. This connection mode does not consider the correlation between multi-scale grid features. Methods: In the first stage, we employ 3D sparse convolution and 2D convolution to fully extract rich semantic features. Then, a small number of coarse RoIs are predicted based region proposal network (RPN) on generated bird’s eye view (BEV) map. After that, we adopt voxel RoI-pooling strategy to aggregate the neighborhood nonempty voxel features of each grid point in RoI in the last two layers of 3D sparse convolution. In this way, we obtain two aggregated features from 3D sparse voxel space for each grid point. Next, we design an attention feature fusion module. This module includes a local and a global attention layer, which can fully integrate the grid point features from different voxel layers. Results: We carry out relevant experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset. The average precisions of our proposed method are 88.21%, 81.51%, 77.07% on three difficulty levels (easy, moderate, and hard, respectively) for 3D detection, and 92.30%, 90.19%, 86.00% on three difficulty levels (easy, moderate, and hard, respectively) for BEV detection. Conclusions: In this paper, we propose a novel two-stage 3D detection algorithm named Grid Attention Fusion Region-based Convolutional Neural Network (GAF-RCNN) from point cloud. Because we integrate multi-scale RoI grid features with attention mechanism in the refinement stage, different multi-scale features can be better correlated, achieving a competitive level compared with other well tested detection algorithms. This 3D object detection has important implications for robot and cobot technology.","PeriodicalId":29807,"journal":{"name":"Cobot","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GAF-RCNN: Grid attention fusion 3D object detection from point cloud\",\"authors\":\"Zheng Li, Guofeng Tong, Hao Peng, Mingwei Ma\",\"doi\":\"10.12688/cobot.17590.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Due to the refinement of region of the interests (RoIs), two-stage 3D detection algorithms can usually obtain better performance compared with most single-stage detectors. However, most two-stage methods adopt feature connection, to aggregate the grid point features using multi-scale RoI pooling in the second stage. This connection mode does not consider the correlation between multi-scale grid features. Methods: In the first stage, we employ 3D sparse convolution and 2D convolution to fully extract rich semantic features. Then, a small number of coarse RoIs are predicted based region proposal network (RPN) on generated bird’s eye view (BEV) map. After that, we adopt voxel RoI-pooling strategy to aggregate the neighborhood nonempty voxel features of each grid point in RoI in the last two layers of 3D sparse convolution. In this way, we obtain two aggregated features from 3D sparse voxel space for each grid point. Next, we design an attention feature fusion module. This module includes a local and a global attention layer, which can fully integrate the grid point features from different voxel layers. Results: We carry out relevant experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset. The average precisions of our proposed method are 88.21%, 81.51%, 77.07% on three difficulty levels (easy, moderate, and hard, respectively) for 3D detection, and 92.30%, 90.19%, 86.00% on three difficulty levels (easy, moderate, and hard, respectively) for BEV detection. Conclusions: In this paper, we propose a novel two-stage 3D detection algorithm named Grid Attention Fusion Region-based Convolutional Neural Network (GAF-RCNN) from point cloud. Because we integrate multi-scale RoI grid features with attention mechanism in the refinement stage, different multi-scale features can be better correlated, achieving a competitive level compared with other well tested detection algorithms. This 3D object detection has important implications for robot and cobot technology.\",\"PeriodicalId\":29807,\"journal\":{\"name\":\"Cobot\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cobot\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12688/cobot.17590.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cobot","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12688/cobot.17590.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GAF-RCNN: Grid attention fusion 3D object detection from point cloud
Background: Due to the refinement of region of the interests (RoIs), two-stage 3D detection algorithms can usually obtain better performance compared with most single-stage detectors. However, most two-stage methods adopt feature connection, to aggregate the grid point features using multi-scale RoI pooling in the second stage. This connection mode does not consider the correlation between multi-scale grid features. Methods: In the first stage, we employ 3D sparse convolution and 2D convolution to fully extract rich semantic features. Then, a small number of coarse RoIs are predicted based region proposal network (RPN) on generated bird’s eye view (BEV) map. After that, we adopt voxel RoI-pooling strategy to aggregate the neighborhood nonempty voxel features of each grid point in RoI in the last two layers of 3D sparse convolution. In this way, we obtain two aggregated features from 3D sparse voxel space for each grid point. Next, we design an attention feature fusion module. This module includes a local and a global attention layer, which can fully integrate the grid point features from different voxel layers. Results: We carry out relevant experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset. The average precisions of our proposed method are 88.21%, 81.51%, 77.07% on three difficulty levels (easy, moderate, and hard, respectively) for 3D detection, and 92.30%, 90.19%, 86.00% on three difficulty levels (easy, moderate, and hard, respectively) for BEV detection. Conclusions: In this paper, we propose a novel two-stage 3D detection algorithm named Grid Attention Fusion Region-based Convolutional Neural Network (GAF-RCNN) from point cloud. Because we integrate multi-scale RoI grid features with attention mechanism in the refinement stage, different multi-scale features can be better correlated, achieving a competitive level compared with other well tested detection algorithms. This 3D object detection has important implications for robot and cobot technology.
期刊介绍:
Cobot is a rapid multidisciplinary open access publishing platform for research focused on the interdisciplinary field of collaborative robots. The aim of Cobot is to enhance knowledge and share the results of the latest innovative technologies for the technicians, researchers and experts engaged in collaborative robot research. The platform will welcome submissions in all areas of scientific and technical research related to collaborative robots, and all articles will benefit from open peer review.
The scope of Cobot includes, but is not limited to:
● Intelligent robots
● Artificial intelligence
● Human-machine collaboration and integration
● Machine vision
● Intelligent sensing
● Smart materials
● Design, development and testing of collaborative robots
● Software for cobots
● Industrial applications of cobots
● Service applications of cobots
● Medical and health applications of cobots
● Educational applications of cobots
As well as research articles and case studies, Cobot accepts a variety of article types including method articles, study protocols, software tools, systematic reviews, data notes, brief reports, and opinion articles.