{"title":"基于学习融合的多视角多模态动作识别","authors":"Sandy Ardianto, H. Hang","doi":"10.23919/APSIPA.2018.8659539","DOIUrl":null,"url":null,"abstract":"In this paper, we study multi-modal and multi-view action recognition system based on the deep-learning techniques. We extended the Temporal Segment Network with additional data fusion stage to combine information from different sources. In this research, we use multiple types of information from different modality such as RGB, depth, infrared data to detect predefined human actions. We tested various combinations of these data sources to examine their impact on the final detection accuracy. We designed 3 information fusion methods to generate the final decision. The most interested one is the Learned Fusion Net designed by us. It turns out the Learned Fusion structure has the best results but requires more training.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Multi-View and Multi-Modal Action Recognition with Learned Fusion\",\"authors\":\"Sandy Ardianto, H. Hang\",\"doi\":\"10.23919/APSIPA.2018.8659539\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study multi-modal and multi-view action recognition system based on the deep-learning techniques. We extended the Temporal Segment Network with additional data fusion stage to combine information from different sources. In this research, we use multiple types of information from different modality such as RGB, depth, infrared data to detect predefined human actions. We tested various combinations of these data sources to examine their impact on the final detection accuracy. We designed 3 information fusion methods to generate the final decision. The most interested one is the Learned Fusion Net designed by us. It turns out the Learned Fusion structure has the best results but requires more training.\",\"PeriodicalId\":287799,\"journal\":{\"name\":\"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPA.2018.8659539\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPA.2018.8659539","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-View and Multi-Modal Action Recognition with Learned Fusion
In this paper, we study multi-modal and multi-view action recognition system based on the deep-learning techniques. We extended the Temporal Segment Network with additional data fusion stage to combine information from different sources. In this research, we use multiple types of information from different modality such as RGB, depth, infrared data to detect predefined human actions. We tested various combinations of these data sources to examine their impact on the final detection accuracy. We designed 3 information fusion methods to generate the final decision. The most interested one is the Learned Fusion Net designed by us. It turns out the Learned Fusion structure has the best results but requires more training.