Bing Du, Ji Zhao, Mingyuan Cao, Mingyang Li, Hailong Yu
{"title":"基于改进的更快RCNN的行为识别","authors":"Bing Du, Ji Zhao, Mingyuan Cao, Mingyang Li, Hailong Yu","doi":"10.1109/CISP-BMEI53629.2021.9624427","DOIUrl":null,"url":null,"abstract":"We divide the recognition process into “object detection” and “behavior prediction”. Firstly, all objects in the image are detected, and then the detection results are used as the input of the behavior recognition part to predict the interaction actions between objects. In the process of feature extraction, we add extra parameters to the sampling point of each convolution kernel to give the characteristic of convolution kernel deformation, so that the network has better adaptability to complex scenes. In the detection of target, the attention mechanism is combined with ResNet network, and the network structure is changed from “post-activation” to “pre-activation”, which makes the suggestion box have certain screening ability and avoids the phenomenon of overfitting. In action prediction, the network takes the instance object in the feature map as the center, the interactive objects around which are detected according to the appearance characteristics and attention weight of the object, and the action scores between them are predicted. Finally, our network is trained on the enhanced COCO dataset. Compared to traditional methods. The proposed method can well detect the actions in the image, and the mAP reaches 67.2%, an increase of nearly 14 percentage points, which is of high experimental value.","PeriodicalId":131256,"journal":{"name":"2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Behavior Recognition Based on Improved Faster RCNN\",\"authors\":\"Bing Du, Ji Zhao, Mingyuan Cao, Mingyang Li, Hailong Yu\",\"doi\":\"10.1109/CISP-BMEI53629.2021.9624427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We divide the recognition process into “object detection” and “behavior prediction”. Firstly, all objects in the image are detected, and then the detection results are used as the input of the behavior recognition part to predict the interaction actions between objects. In the process of feature extraction, we add extra parameters to the sampling point of each convolution kernel to give the characteristic of convolution kernel deformation, so that the network has better adaptability to complex scenes. In the detection of target, the attention mechanism is combined with ResNet network, and the network structure is changed from “post-activation” to “pre-activation”, which makes the suggestion box have certain screening ability and avoids the phenomenon of overfitting. In action prediction, the network takes the instance object in the feature map as the center, the interactive objects around which are detected according to the appearance characteristics and attention weight of the object, and the action scores between them are predicted. Finally, our network is trained on the enhanced COCO dataset. Compared to traditional methods. The proposed method can well detect the actions in the image, and the mAP reaches 67.2%, an increase of nearly 14 percentage points, which is of high experimental value.\",\"PeriodicalId\":131256,\"journal\":{\"name\":\"2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISP-BMEI53629.2021.9624427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI53629.2021.9624427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Behavior Recognition Based on Improved Faster RCNN
We divide the recognition process into “object detection” and “behavior prediction”. Firstly, all objects in the image are detected, and then the detection results are used as the input of the behavior recognition part to predict the interaction actions between objects. In the process of feature extraction, we add extra parameters to the sampling point of each convolution kernel to give the characteristic of convolution kernel deformation, so that the network has better adaptability to complex scenes. In the detection of target, the attention mechanism is combined with ResNet network, and the network structure is changed from “post-activation” to “pre-activation”, which makes the suggestion box have certain screening ability and avoids the phenomenon of overfitting. In action prediction, the network takes the instance object in the feature map as the center, the interactive objects around which are detected according to the appearance characteristics and attention weight of the object, and the action scores between them are predicted. Finally, our network is trained on the enhanced COCO dataset. Compared to traditional methods. The proposed method can well detect the actions in the image, and the mAP reaches 67.2%, an increase of nearly 14 percentage points, which is of high experimental value.