{"title":"一种用于识别电影搞笑场景的深度强化学习框架","authors":"Haoqi Li, Naveen Kumar, Ruxin Chen, P. Georgiou","doi":"10.1109/ICASSP.2018.8462686","DOIUrl":null,"url":null,"abstract":"This paper presents a novel deep Reinforcement Learning (RL) framework for classifying movie scenes based on affect using the face images detected in the video stream as input. Extracting affective information from the video is a challenging task modulating complex visual and temporal representations intertwined with the complex aspects of human perception and information integration. This also makes it difficult to collect a large annotated corpus restricting the use of supervised learning methods. We present an alternative learning framework based on RL that is tolerant to label sparsity and can easily make use of any available ground truth in an online fashion. We employ this modified RL model for the binary classification of whether a scene is funny or not on a dataset of movie scene clips. The results show that our model correctly predicts 72.95% of the time on the 2–3 minute long movie scenes while on shorter scenes the accuracy obtained is 84.13%.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"3116-3120"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"A Deep Reinforcement Learning Framework for Identifying Funny Scenes in Movies\",\"authors\":\"Haoqi Li, Naveen Kumar, Ruxin Chen, P. Georgiou\",\"doi\":\"10.1109/ICASSP.2018.8462686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel deep Reinforcement Learning (RL) framework for classifying movie scenes based on affect using the face images detected in the video stream as input. Extracting affective information from the video is a challenging task modulating complex visual and temporal representations intertwined with the complex aspects of human perception and information integration. This also makes it difficult to collect a large annotated corpus restricting the use of supervised learning methods. We present an alternative learning framework based on RL that is tolerant to label sparsity and can easily make use of any available ground truth in an online fashion. We employ this modified RL model for the binary classification of whether a scene is funny or not on a dataset of movie scene clips. The results show that our model correctly predicts 72.95% of the time on the 2–3 minute long movie scenes while on shorter scenes the accuracy obtained is 84.13%.\",\"PeriodicalId\":6638,\"journal\":{\"name\":\"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"22 1\",\"pages\":\"3116-3120\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2018.8462686\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2018.8462686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Deep Reinforcement Learning Framework for Identifying Funny Scenes in Movies
This paper presents a novel deep Reinforcement Learning (RL) framework for classifying movie scenes based on affect using the face images detected in the video stream as input. Extracting affective information from the video is a challenging task modulating complex visual and temporal representations intertwined with the complex aspects of human perception and information integration. This also makes it difficult to collect a large annotated corpus restricting the use of supervised learning methods. We present an alternative learning framework based on RL that is tolerant to label sparsity and can easily make use of any available ground truth in an online fashion. We employ this modified RL model for the binary classification of whether a scene is funny or not on a dataset of movie scene clips. The results show that our model correctly predicts 72.95% of the time on the 2–3 minute long movie scenes while on shorter scenes the accuracy obtained is 84.13%.