{"title":"基于Fisher准则的两流深度残差学习人体动作识别","authors":"D. V. Sang, Hoang Trung Dung","doi":"10.1145/3287921.3287972","DOIUrl":null,"url":null,"abstract":"Action recognition is one of the most important areas in the computer vision community. Many previous work use two-stream CNN model to obtain both spatial and temporal clues for predicting task. However, two stream are trained separately and combined later by late fusion. This strategy has overlooked the spatial-temporal features interaction. In this paper, we propose new two-stream CNN architectures that are able to learn the relation between two kinds of features. Furthermore, they can be trained end-to-end with standard back propagation algorithm. We also introduce a Fisher loss that makes features more discriminative. The experiments show that Fisher loss yields higher accuracy than using only the softmax loss.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Two-stream Deep Residual Learning with Fisher Criterion for Human Action Recognition\",\"authors\":\"D. V. Sang, Hoang Trung Dung\",\"doi\":\"10.1145/3287921.3287972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Action recognition is one of the most important areas in the computer vision community. Many previous work use two-stream CNN model to obtain both spatial and temporal clues for predicting task. However, two stream are trained separately and combined later by late fusion. This strategy has overlooked the spatial-temporal features interaction. In this paper, we propose new two-stream CNN architectures that are able to learn the relation between two kinds of features. Furthermore, they can be trained end-to-end with standard back propagation algorithm. We also introduce a Fisher loss that makes features more discriminative. The experiments show that Fisher loss yields higher accuracy than using only the softmax loss.\",\"PeriodicalId\":448008,\"journal\":{\"name\":\"Proceedings of the 9th International Symposium on Information and Communication Technology\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th International Symposium on Information and Communication Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3287921.3287972\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Symposium on Information and Communication Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3287921.3287972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Two-stream Deep Residual Learning with Fisher Criterion for Human Action Recognition
Action recognition is one of the most important areas in the computer vision community. Many previous work use two-stream CNN model to obtain both spatial and temporal clues for predicting task. However, two stream are trained separately and combined later by late fusion. This strategy has overlooked the spatial-temporal features interaction. In this paper, we propose new two-stream CNN architectures that are able to learn the relation between two kinds of features. Furthermore, they can be trained end-to-end with standard back propagation algorithm. We also introduce a Fisher loss that makes features more discriminative. The experiments show that Fisher loss yields higher accuracy than using only the softmax loss.