{"title":"Sentiment analysis for live video comments with variational residual representations","authors":"Changfan Luo , Ling Fang , Bensheng Qiu","doi":"10.1016/j.csl.2025.101838","DOIUrl":null,"url":null,"abstract":"<div><div>Live video comment (LVC) is valuable for public opinion analysis, communication, and user engagement. Analyzing the sentiment in LVC is crucial for understanding their content, especially when strong emotions are involved. However, compared to normal text, LVC exhibits a stronger real-time nature, as well as context-dependent and cross-modal misalignment. Conventional sentiment analysis methods rely solely on textual information and explicit context, yet current multi-modal sentiment analysis models are insufficient to discriminate context and align multi-modal information. To address these challenges, we propose a novel variational residual fusion network based on a variational autoencoder for sentiment analysis of LVCs. Especially, an autofilter module is introduced in the encoder to filter out useful surrounding comments as contextual information for the target comment. A residual fusion module is embedded between the encoder and decoder to discriminate the most relevant visual information, facilitating the alignment of multi-modal information and thereby enhancing the learning of target comment representation. Furthermore, our method follows a multi-task learning scheme to help the model reinforce the representation of the target comments and improve the effectiveness of sentiment analysis. Extensive experiments suggest the effectiveness of the proposed framework in this work. <span><span><sup>1</sup></span></span></div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101838"},"PeriodicalIF":3.4000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000634","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Live video comment (LVC) is valuable for public opinion analysis, communication, and user engagement. Analyzing the sentiment in LVC is crucial for understanding their content, especially when strong emotions are involved. However, compared to normal text, LVC exhibits a stronger real-time nature, as well as context-dependent and cross-modal misalignment. Conventional sentiment analysis methods rely solely on textual information and explicit context, yet current multi-modal sentiment analysis models are insufficient to discriminate context and align multi-modal information. To address these challenges, we propose a novel variational residual fusion network based on a variational autoencoder for sentiment analysis of LVCs. Especially, an autofilter module is introduced in the encoder to filter out useful surrounding comments as contextual information for the target comment. A residual fusion module is embedded between the encoder and decoder to discriminate the most relevant visual information, facilitating the alignment of multi-modal information and thereby enhancing the learning of target comment representation. Furthermore, our method follows a multi-task learning scheme to help the model reinforce the representation of the target comments and improve the effectiveness of sentiment analysis. Extensive experiments suggest the effectiveness of the proposed framework in this work. 1
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.