Vu Lam, Sang Phan Le, Tien Do, T. Ngo, Duy-Dinh Le, D. Duong
{"title":"Computational optimization for violent scenes detection","authors":"Vu Lam, Sang Phan Le, Tien Do, T. Ngo, Duy-Dinh Le, D. Duong","doi":"10.1109/IC3INA.2016.7863039","DOIUrl":null,"url":null,"abstract":"Violent scenes detection (VSD) can be considered as a specific problem of multimedia event detection. One popular approach to this problem is to employ multiple modals for presentation. By combining complementary modals, it has been shown remarkable improvement in accuracy. But, such an approach also requires high computational cost to process all features globally and locally extracted from static frames, video sequences, audio streams, or deep visual features. In this paper, we address the problem of modal selection (i.e. feature selection) when the computing resource (including both CPU and GPU) is limited. We evaluated possible combinations of features with different specifications of the computing resource. Evaluation results can be used to choose the optimal set of features for high accuracy regarding a pre-selected resource. We conducted experiments on the benchmark dataset MedialEval VSD 2014 (total of 60 hours).","PeriodicalId":225675,"journal":{"name":"2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3INA.2016.7863039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Violent scenes detection (VSD) can be considered as a specific problem of multimedia event detection. One popular approach to this problem is to employ multiple modals for presentation. By combining complementary modals, it has been shown remarkable improvement in accuracy. But, such an approach also requires high computational cost to process all features globally and locally extracted from static frames, video sequences, audio streams, or deep visual features. In this paper, we address the problem of modal selection (i.e. feature selection) when the computing resource (including both CPU and GPU) is limited. We evaluated possible combinations of features with different specifications of the computing resource. Evaluation results can be used to choose the optimal set of features for high accuracy regarding a pre-selected resource. We conducted experiments on the benchmark dataset MedialEval VSD 2014 (total of 60 hours).