{"title":"基于变长序列输入的影视剧多维情感分析预测方法","authors":"Chunxiao Wang, Jingiing Zhang, Lihong Gan, Wei Jiang","doi":"10.1109/CoST57098.2022.00010","DOIUrl":null,"url":null,"abstract":"Time continuous emotion prediction problem has always been one of the difficulties in affective video content analysis. The current research mainly designs a temporally continuous long video emotion prediction method by dividing the long video into short video segments of fixed duration. These methods ignore the time dependencies between short video clips and the mood changes in short video clips. Therefore, combined with the related concepts of film and television narrative structure in cinematic language, this paper defines a prediction method for dimensional sentiment analysis of the movie and TV drama based on variable sequence length inputs. First, this paper defines a method for partitioning variable-length audiovisual sequences that set subunits of dimensional emotion prediction as variable sequence-length inputs. Then, a method for extracting and combining audio and visual features of each variable-length audiovisual sequence is proposed. Finally, a prediction network for dimensional emotion is designed based on variable sequence length inputs. This paper focuses on dimensional sentiment prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method achieves comparable performance to other methods while increasing the prediction speed, with the Mean Square Error (MSE) reduced from 0.13 to 0.11 for arousal and from 0.19 to 0.13 for valence.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Prediction Method for Dimensional Sentiment Analysis of the Movie and TV Drama based on Variable-length Sequence Input\",\"authors\":\"Chunxiao Wang, Jingiing Zhang, Lihong Gan, Wei Jiang\",\"doi\":\"10.1109/CoST57098.2022.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Time continuous emotion prediction problem has always been one of the difficulties in affective video content analysis. The current research mainly designs a temporally continuous long video emotion prediction method by dividing the long video into short video segments of fixed duration. These methods ignore the time dependencies between short video clips and the mood changes in short video clips. Therefore, combined with the related concepts of film and television narrative structure in cinematic language, this paper defines a prediction method for dimensional sentiment analysis of the movie and TV drama based on variable sequence length inputs. First, this paper defines a method for partitioning variable-length audiovisual sequences that set subunits of dimensional emotion prediction as variable sequence-length inputs. Then, a method for extracting and combining audio and visual features of each variable-length audiovisual sequence is proposed. Finally, a prediction network for dimensional emotion is designed based on variable sequence length inputs. This paper focuses on dimensional sentiment prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method achieves comparable performance to other methods while increasing the prediction speed, with the Mean Square Error (MSE) reduced from 0.13 to 0.11 for arousal and from 0.19 to 0.13 for valence.\",\"PeriodicalId\":135595,\"journal\":{\"name\":\"2022 International Conference on Culture-Oriented Science and Technology (CoST)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Culture-Oriented Science and Technology (CoST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CoST57098.2022.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoST57098.2022.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Prediction Method for Dimensional Sentiment Analysis of the Movie and TV Drama based on Variable-length Sequence Input
Time continuous emotion prediction problem has always been one of the difficulties in affective video content analysis. The current research mainly designs a temporally continuous long video emotion prediction method by dividing the long video into short video segments of fixed duration. These methods ignore the time dependencies between short video clips and the mood changes in short video clips. Therefore, combined with the related concepts of film and television narrative structure in cinematic language, this paper defines a prediction method for dimensional sentiment analysis of the movie and TV drama based on variable sequence length inputs. First, this paper defines a method for partitioning variable-length audiovisual sequences that set subunits of dimensional emotion prediction as variable sequence-length inputs. Then, a method for extracting and combining audio and visual features of each variable-length audiovisual sequence is proposed. Finally, a prediction network for dimensional emotion is designed based on variable sequence length inputs. This paper focuses on dimensional sentiment prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method achieves comparable performance to other methods while increasing the prediction speed, with the Mean Square Error (MSE) reduced from 0.13 to 0.11 for arousal and from 0.19 to 0.13 for valence.