{"title":"Open Action Recognition by A 3D Convolutional Neural Network Combining with An Open Fuzzy Min-Max Neural Network","authors":"Chia-Ying Wu, Y. Tsay, A. C. Shih","doi":"10.1109/ARIS56205.2022.9910444","DOIUrl":null,"url":null,"abstract":"The 3-dimensional convolution neural network (3D CNN) has demonstrated a high prediction power for action recognition, when the inputs belong to the known classes. In a real application, however, if considering the inputs from unknown classes, previous studies have revealed that some prediction results can have high softmax scores falsely for known classes. That is called the open set recognition problem. Recently, a series of statistical methods based on an openmax approach have been proposed to solve the problem in 2D image data. However, how to apply the approach to video data is still unknown. Without using a prior statistical model, we propose a two-stage approach for open action recognition in this paper. A 3D CNN model is trained in the first stage. Then, the activation vector data, the output from the activation layer, are extracted as the feature data for training a fuzzy min-max neural network (FMMNN) as a classifier in the second stage. Since the value ranges of an activation vector are not limited between 0 and 1, an open FMMNN with a new fuzzy membership function without the normalization of input data is proposed and then constructed by the feature data. Finally, the prediction output is selected by the class with the maximum membership value. In the results, two separated datasets of mouse action videos were used for the training and the prediction test, respectively. We found that the proposed method can indeed improve the prediction performance. Moreover, using the human action and random background videos as two unknown datasets, we also demonstrated that the prediction outputs from known and unknown sets can be distinguished by a single threshold. In short, the proposed open FNNMM can not only improve the prediction performance from the inputs from known classes but also detect the inputs from unknown classes.","PeriodicalId":254572,"journal":{"name":"2022 International Conference on Advanced Robotics and Intelligent Systems (ARIS)","volume":"11 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Advanced Robotics and Intelligent Systems (ARIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARIS56205.2022.9910444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The 3-dimensional convolution neural network (3D CNN) has demonstrated a high prediction power for action recognition, when the inputs belong to the known classes. In a real application, however, if considering the inputs from unknown classes, previous studies have revealed that some prediction results can have high softmax scores falsely for known classes. That is called the open set recognition problem. Recently, a series of statistical methods based on an openmax approach have been proposed to solve the problem in 2D image data. However, how to apply the approach to video data is still unknown. Without using a prior statistical model, we propose a two-stage approach for open action recognition in this paper. A 3D CNN model is trained in the first stage. Then, the activation vector data, the output from the activation layer, are extracted as the feature data for training a fuzzy min-max neural network (FMMNN) as a classifier in the second stage. Since the value ranges of an activation vector are not limited between 0 and 1, an open FMMNN with a new fuzzy membership function without the normalization of input data is proposed and then constructed by the feature data. Finally, the prediction output is selected by the class with the maximum membership value. In the results, two separated datasets of mouse action videos were used for the training and the prediction test, respectively. We found that the proposed method can indeed improve the prediction performance. Moreover, using the human action and random background videos as two unknown datasets, we also demonstrated that the prediction outputs from known and unknown sets can be distinguished by a single threshold. In short, the proposed open FNNMM can not only improve the prediction performance from the inputs from known classes but also detect the inputs from unknown classes.