基于视觉的预训练AlexNet人类动作识别

2019 9th IEEE International Conference on Control System, Computing and Engineering (ICCSCE) Pub Date : 2019-11-01 DOI:10.1109/ICCSCE47578.2019.9068586

N. M. Zamri, Goh Fan Ling, Pang Ying Han, S. Ooi

{"title":"基于视觉的预训练AlexNet人类动作识别","authors":"N. M. Zamri, Goh Fan Ling, Pang Ying Han, S. Ooi","doi":"10.1109/ICCSCE47578.2019.9068586","DOIUrl":null,"url":null,"abstract":"The Deep learning analysis has been extensively carried out in the context of object/ pattern recognition due to its excellence in feature extraction and classification. However, the superior performance just can be guaranteed with the availability of huge amounts of training data and also high-specification data processing unit to process the data deeper at high speeds. Hence, another alternative is by applying transfer learning. In transfer learning, a neural network model is first trained on a data similar to the targeted data. With that, the knowledge such as features, weights etc. could be leveraged from the trained model to train the new model. In this project, a vision-based human action recognition via a transfer learning is conducted. Specifically, in the proposed approach, the earlier layers of a pre-trained AlexNet is preserved since those extracted low-level features are characterizing generic features which are common to most data. However, the pre-train network is fine-tuned based on our interested data, that is human action data. Since AlexNet requires input data of size 227*227*3, the frames of each video are processed into 3 different templates. The three computed templates are: (1) Motion History Image carrying spatio-temporal information, (2) Binary Motion Energy Image incorporating motion region information and (3) optical flow template holding accumulative motion speed information. The proposed approach is validated on two publicly available databases, which are Weizmann database and KTH database. From the empirical results, a promising performance is obtained with about 90% accuracy from the databases.","PeriodicalId":221890,"journal":{"name":"2019 9th IEEE International Conference on Control System, Computing and Engineering (ICCSCE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Vision-based Human Action Recognition on Pre-trained AlexNet\",\"authors\":\"N. M. Zamri, Goh Fan Ling, Pang Ying Han, S. Ooi\",\"doi\":\"10.1109/ICCSCE47578.2019.9068586\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Deep learning analysis has been extensively carried out in the context of object/ pattern recognition due to its excellence in feature extraction and classification. However, the superior performance just can be guaranteed with the availability of huge amounts of training data and also high-specification data processing unit to process the data deeper at high speeds. Hence, another alternative is by applying transfer learning. In transfer learning, a neural network model is first trained on a data similar to the targeted data. With that, the knowledge such as features, weights etc. could be leveraged from the trained model to train the new model. In this project, a vision-based human action recognition via a transfer learning is conducted. Specifically, in the proposed approach, the earlier layers of a pre-trained AlexNet is preserved since those extracted low-level features are characterizing generic features which are common to most data. However, the pre-train network is fine-tuned based on our interested data, that is human action data. Since AlexNet requires input data of size 227*227*3, the frames of each video are processed into 3 different templates. The three computed templates are: (1) Motion History Image carrying spatio-temporal information, (2) Binary Motion Energy Image incorporating motion region information and (3) optical flow template holding accumulative motion speed information. The proposed approach is validated on two publicly available databases, which are Weizmann database and KTH database. From the empirical results, a promising performance is obtained with about 90% accuracy from the databases.\",\"PeriodicalId\":221890,\"journal\":{\"name\":\"2019 9th IEEE International Conference on Control System, Computing and Engineering (ICCSCE)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 9th IEEE International Conference on Control System, Computing and Engineering (ICCSCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSCE47578.2019.9068586\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 9th IEEE International Conference on Control System, Computing and Engineering (ICCSCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSCE47578.2019.9068586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

深度学习分析由于其在特征提取和分类方面的优异表现，在对象/模式识别领域得到了广泛的应用。然而，只有大量的训练数据的可用性和高规格的数据处理单元以高速更深入地处理数据，才能保证优越的性能。因此，另一种选择是应用迁移学习。在迁移学习中，首先在与目标数据相似的数据上训练神经网络模型。这样，就可以从训练好的模型中利用特征、权重等知识来训练新模型。在这个项目中，通过迁移学习进行了基于视觉的人类动作识别。具体来说，在提出的方法中，保留了预训练AlexNet的早期层，因为那些提取的低级特征表征了大多数数据共有的通用特征。然而，预训练网络是基于我们感兴趣的数据，即人类行为数据进行微调的。由于AlexNet需要的输入数据大小为227*227*3，因此每个视频的帧被处理成3个不同的模板。计算得到的三个模板分别是:(1)携带时空信息的运动历史图像;(2)包含运动区域信息的二值运动能量图像;(3)包含累计运动速度信息的光流模板。在Weizmann数据库和KTH数据库两个公开可用的数据库上进行了验证。从实验结果来看，从数据库中获得了很好的性能，准确率约为90%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Vision-based Human Action Recognition on Pre-trained AlexNet

The Deep learning analysis has been extensively carried out in the context of object/ pattern recognition due to its excellence in feature extraction and classification. However, the superior performance just can be guaranteed with the availability of huge amounts of training data and also high-specification data processing unit to process the data deeper at high speeds. Hence, another alternative is by applying transfer learning. In transfer learning, a neural network model is first trained on a data similar to the targeted data. With that, the knowledge such as features, weights etc. could be leveraged from the trained model to train the new model. In this project, a vision-based human action recognition via a transfer learning is conducted. Specifically, in the proposed approach, the earlier layers of a pre-trained AlexNet is preserved since those extracted low-level features are characterizing generic features which are common to most data. However, the pre-train network is fine-tuned based on our interested data, that is human action data. Since AlexNet requires input data of size 227*227*3, the frames of each video are processed into 3 different templates. The three computed templates are: (1) Motion History Image carrying spatio-temporal information, (2) Binary Motion Energy Image incorporating motion region information and (3) optical flow template holding accumulative motion speed information. The proposed approach is validated on two publicly available databases, which are Weizmann database and KTH database. From the empirical results, a promising performance is obtained with about 90% accuracy from the databases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 9th IEEE International Conference on Control System, Computing and Engineering (ICCSCE)

自引率

0.00%

发文量