{"title":"End-to-end soccer video scene and event classification with deep transfer learning","authors":"Yuxi Hong, Chen Ling, Zuochang Ye","doi":"10.1109/ISACV.2018.8369043","DOIUrl":null,"url":null,"abstract":"Soccer video scene and event classification are two essential tasks for the soccer video semantic analysis and have attracted many interests of researchers because of their importance and practicability. However most proposed methods solve these two tasks separately. In order to solve two tasks at the same time and improve the efficiency of video processing, we treat them as one end-to-end classification task. We introduce a new Soccer Video Scene and Event Dataset (SVSED) with six categories from the scenes and events, which contains 600 video clips. Then, we show that frame features extracted from pretrained CNN model of different categories are separable in 3-D space. Finally, we construct a CNN model for the classification task and deep transfer learning method is used for optimizing classification task result considering relative small training datasets. We fine-tuned several state-of-art CNN models and achieves accuracy above 89% within several minutes training.","PeriodicalId":184662,"journal":{"name":"2018 International Conference on Intelligent Systems and Computer Vision (ISCV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Intelligent Systems and Computer Vision (ISCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISACV.2018.8369043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
Soccer video scene and event classification are two essential tasks for the soccer video semantic analysis and have attracted many interests of researchers because of their importance and practicability. However most proposed methods solve these two tasks separately. In order to solve two tasks at the same time and improve the efficiency of video processing, we treat them as one end-to-end classification task. We introduce a new Soccer Video Scene and Event Dataset (SVSED) with six categories from the scenes and events, which contains 600 video clips. Then, we show that frame features extracted from pretrained CNN model of different categories are separable in 3-D space. Finally, we construct a CNN model for the classification task and deep transfer learning method is used for optimizing classification task result considering relative small training datasets. We fine-tuned several state-of-art CNN models and achieves accuracy above 89% within several minutes training.