{"title":"动作检测视觉强化学习的异构图卷积网络","authors":"Liangliang Wang, Chengxi Huang, Xinwei Chen","doi":"10.1109/CACRE58689.2023.10208414","DOIUrl":null,"url":null,"abstract":"Existing action detection approaches do not take spatio-temporal structural relationships of action clips into account, which leads to a low applicability in real-world scenarios and can benefit detecting if exploited. To this end, this paper proposes to formulate the action detection problem as a reinforcement learning process which is rewarded by observing both the clip sampling and classification results via adjusting the detection schemes. In particular, our framework consists of a heterogeneous graph convolutional network to represent the spatio-temporal features capturing the inherent relation, a policy network which determines the probabilities of a predefined action sampling spaces, and a classification network for action clip recognition. We accomplish the network joint learning by considering the temporal intersection over union and Euclidean distance between detected clips and ground-truth. Experiments on ActivityNet v1.3 and THUMOS14 demonstrate our method.","PeriodicalId":447007,"journal":{"name":"2023 8th International Conference on Automation, Control and Robotics Engineering (CACRE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Heterogeneous Graph Convolutional Network for Visual Reinforcement Learning of Action Detection\",\"authors\":\"Liangliang Wang, Chengxi Huang, Xinwei Chen\",\"doi\":\"10.1109/CACRE58689.2023.10208414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing action detection approaches do not take spatio-temporal structural relationships of action clips into account, which leads to a low applicability in real-world scenarios and can benefit detecting if exploited. To this end, this paper proposes to formulate the action detection problem as a reinforcement learning process which is rewarded by observing both the clip sampling and classification results via adjusting the detection schemes. In particular, our framework consists of a heterogeneous graph convolutional network to represent the spatio-temporal features capturing the inherent relation, a policy network which determines the probabilities of a predefined action sampling spaces, and a classification network for action clip recognition. We accomplish the network joint learning by considering the temporal intersection over union and Euclidean distance between detected clips and ground-truth. Experiments on ActivityNet v1.3 and THUMOS14 demonstrate our method.\",\"PeriodicalId\":447007,\"journal\":{\"name\":\"2023 8th International Conference on Automation, Control and Robotics Engineering (CACRE)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 8th International Conference on Automation, Control and Robotics Engineering (CACRE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CACRE58689.2023.10208414\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 8th International Conference on Automation, Control and Robotics Engineering (CACRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CACRE58689.2023.10208414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Heterogeneous Graph Convolutional Network for Visual Reinforcement Learning of Action Detection
Existing action detection approaches do not take spatio-temporal structural relationships of action clips into account, which leads to a low applicability in real-world scenarios and can benefit detecting if exploited. To this end, this paper proposes to formulate the action detection problem as a reinforcement learning process which is rewarded by observing both the clip sampling and classification results via adjusting the detection schemes. In particular, our framework consists of a heterogeneous graph convolutional network to represent the spatio-temporal features capturing the inherent relation, a policy network which determines the probabilities of a predefined action sampling spaces, and a classification network for action clip recognition. We accomplish the network joint learning by considering the temporal intersection over union and Euclidean distance between detected clips and ground-truth. Experiments on ActivityNet v1.3 and THUMOS14 demonstrate our method.