{"title":"Sequential Deep Learning for Disaster-Related Video Classification","authors":"Haiman Tian, Hector Cen Zheng, Shu‐Ching Chen","doi":"10.1109/MIPR.2018.00026","DOIUrl":null,"url":null,"abstract":"Videos serve to convey complex semantic information and ease the understanding of new knowledge. However, when mixed semantic meanings from different modalities (i.e., image, video, text) are involved, it is more difficult for a computer model to detect and classify the concepts (such as flood, storm, and animals). This paper presents a multimodal deep learning framework to improve video concept classification by leveraging recent advances in transfer learning and sequential deep learning models. Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN) models are then used to obtain the sequential semantics for both audio and textual models. The proposed framework is applied to a disaster-related video dataset that includes not only disaster scenes, but also the activities that took place during the disaster event. The experimental results show the effectiveness of the proposed framework.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"351 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MIPR.2018.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Videos serve to convey complex semantic information and ease the understanding of new knowledge. However, when mixed semantic meanings from different modalities (i.e., image, video, text) are involved, it is more difficult for a computer model to detect and classify the concepts (such as flood, storm, and animals). This paper presents a multimodal deep learning framework to improve video concept classification by leveraging recent advances in transfer learning and sequential deep learning models. Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN) models are then used to obtain the sequential semantics for both audio and textual models. The proposed framework is applied to a disaster-related video dataset that includes not only disaster scenes, but also the activities that took place during the disaster event. The experimental results show the effectiveness of the proposed framework.