Alexander Grushin, Derek Monner, J. Reggia, A. Mishra
{"title":"基于长短期记忆的稳健人类动作识别","authors":"Alexander Grushin, Derek Monner, J. Reggia, A. Mishra","doi":"10.1109/IJCNN.2013.6706797","DOIUrl":null,"url":null,"abstract":"The long short-term memory (LSTM) neural network utilizes specialized modulation mechanisms to store information for extended periods of time. It is thus potentially well-suited for complex visual processing, where the current video frame must be considered in the context of past frames. Recent studies have indeed shown that LSTM can effectively recognize and classify human actions (e.g., running, hand waving) in video data; however, these results were achieved under somewhat restricted settings. In this effort, we seek to demonstrate that LSTM's performance remains robust even as experimental conditions deteriorate. Specifically, we show that classification accuracy exhibits graceful degradation when the LSTM network is faced with (a) lower quantities of available training data, (b) tighter deadlines for decision making (i.e., shorter available input data sequences) and (c) poorer video quality (resulting from noise, dropped frames or reduced resolution). We also clearly demonstrate the benefits of memory for video processing, particularly, under high noise or frame drop rates. Our study is thus an initial step towards demonstrating LSTM's potential for robust action recognition in real-world scenarios.","PeriodicalId":376975,"journal":{"name":"The 2013 International Joint Conference on Neural Networks (IJCNN)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":"{\"title\":\"Robust human action recognition via long short-term memory\",\"authors\":\"Alexander Grushin, Derek Monner, J. Reggia, A. Mishra\",\"doi\":\"10.1109/IJCNN.2013.6706797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The long short-term memory (LSTM) neural network utilizes specialized modulation mechanisms to store information for extended periods of time. It is thus potentially well-suited for complex visual processing, where the current video frame must be considered in the context of past frames. Recent studies have indeed shown that LSTM can effectively recognize and classify human actions (e.g., running, hand waving) in video data; however, these results were achieved under somewhat restricted settings. In this effort, we seek to demonstrate that LSTM's performance remains robust even as experimental conditions deteriorate. Specifically, we show that classification accuracy exhibits graceful degradation when the LSTM network is faced with (a) lower quantities of available training data, (b) tighter deadlines for decision making (i.e., shorter available input data sequences) and (c) poorer video quality (resulting from noise, dropped frames or reduced resolution). We also clearly demonstrate the benefits of memory for video processing, particularly, under high noise or frame drop rates. Our study is thus an initial step towards demonstrating LSTM's potential for robust action recognition in real-world scenarios.\",\"PeriodicalId\":376975,\"journal\":{\"name\":\"The 2013 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"67\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2013 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.2013.6706797\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2013 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2013.6706797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robust human action recognition via long short-term memory
The long short-term memory (LSTM) neural network utilizes specialized modulation mechanisms to store information for extended periods of time. It is thus potentially well-suited for complex visual processing, where the current video frame must be considered in the context of past frames. Recent studies have indeed shown that LSTM can effectively recognize and classify human actions (e.g., running, hand waving) in video data; however, these results were achieved under somewhat restricted settings. In this effort, we seek to demonstrate that LSTM's performance remains robust even as experimental conditions deteriorate. Specifically, we show that classification accuracy exhibits graceful degradation when the LSTM network is faced with (a) lower quantities of available training data, (b) tighter deadlines for decision making (i.e., shorter available input data sequences) and (c) poorer video quality (resulting from noise, dropped frames or reduced resolution). We also clearly demonstrate the benefits of memory for video processing, particularly, under high noise or frame drop rates. Our study is thus an initial step towards demonstrating LSTM's potential for robust action recognition in real-world scenarios.