{"title":"基于零射击学习的细粒度人体动作识别","authors":"Yahui Zhao, Ping Shi, Ji’an You","doi":"10.1109/ICSESS47205.2019.9040818","DOIUrl":null,"url":null,"abstract":"In recent years, the number of categories of human action recognition is increasing rapidly. On the one hand, the traditional supervised learning model has become increasingly difficult to collect enough training data to identify all categories. On the other hand, for some well-trained traditional supervised learning models, it is a waste of time to collect enough samples of new categories and retrain them together in order to identify new categories. We proposes a mapping between visual features of video and semantic description of fine-grained human action recognition. Unlike most current zero-shot learning methods, which use manual features as visual features, we uses features learnt from I3D network model as visual features, which are more general than manual features.","PeriodicalId":203944,"journal":{"name":"2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Fine-grained Human Action Recognition Based on Zero-Shot Learning\",\"authors\":\"Yahui Zhao, Ping Shi, Ji’an You\",\"doi\":\"10.1109/ICSESS47205.2019.9040818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the number of categories of human action recognition is increasing rapidly. On the one hand, the traditional supervised learning model has become increasingly difficult to collect enough training data to identify all categories. On the other hand, for some well-trained traditional supervised learning models, it is a waste of time to collect enough samples of new categories and retrain them together in order to identify new categories. We proposes a mapping between visual features of video and semantic description of fine-grained human action recognition. Unlike most current zero-shot learning methods, which use manual features as visual features, we uses features learnt from I3D network model as visual features, which are more general than manual features.\",\"PeriodicalId\":203944,\"journal\":{\"name\":\"2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSESS47205.2019.9040818\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS47205.2019.9040818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fine-grained Human Action Recognition Based on Zero-Shot Learning
In recent years, the number of categories of human action recognition is increasing rapidly. On the one hand, the traditional supervised learning model has become increasingly difficult to collect enough training data to identify all categories. On the other hand, for some well-trained traditional supervised learning models, it is a waste of time to collect enough samples of new categories and retrain them together in order to identify new categories. We proposes a mapping between visual features of video and semantic description of fine-grained human action recognition. Unlike most current zero-shot learning methods, which use manual features as visual features, we uses features learnt from I3D network model as visual features, which are more general than manual features.