Sentiment analysis for live video comments with variational residual representations

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2025-06-09 DOI:10.1016/j.csl.2025.101838

Changfan Luo , Ling Fang , Bensheng Qiu

{"title":"Sentiment analysis for live video comments with variational residual representations","authors":"Changfan Luo , Ling Fang , Bensheng Qiu","doi":"10.1016/j.csl.2025.101838","DOIUrl":null,"url":null,"abstract":"<div><div>Live video comment (LVC) is valuable for public opinion analysis, communication, and user engagement. Analyzing the sentiment in LVC is crucial for understanding their content, especially when strong emotions are involved. However, compared to normal text, LVC exhibits a stronger real-time nature, as well as context-dependent and cross-modal misalignment. Conventional sentiment analysis methods rely solely on textual information and explicit context, yet current multi-modal sentiment analysis models are insufficient to discriminate context and align multi-modal information. To address these challenges, we propose a novel variational residual fusion network based on a variational autoencoder for sentiment analysis of LVCs. Especially, an autofilter module is introduced in the encoder to filter out useful surrounding comments as contextual information for the target comment. A residual fusion module is embedded between the encoder and decoder to discriminate the most relevant visual information, facilitating the alignment of multi-modal information and thereby enhancing the learning of target comment representation. Furthermore, our method follows a multi-task learning scheme to help the model reinforce the representation of the target comments and improve the effectiveness of sentiment analysis. Extensive experiments suggest the effectiveness of the proposed framework in this work. <span><span><sup>1</sup></span></span></div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101838"},"PeriodicalIF":3.4000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000634","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Live video comment (LVC) is valuable for public opinion analysis, communication, and user engagement. Analyzing the sentiment in LVC is crucial for understanding their content, especially when strong emotions are involved. However, compared to normal text, LVC exhibits a stronger real-time nature, as well as context-dependent and cross-modal misalignment. Conventional sentiment analysis methods rely solely on textual information and explicit context, yet current multi-modal sentiment analysis models are insufficient to discriminate context and align multi-modal information. To address these challenges, we propose a novel variational residual fusion network based on a variational autoencoder for sentiment analysis of LVCs. Especially, an autofilter module is introduced in the encoder to filter out useful surrounding comments as contextual information for the target comment. A residual fusion module is embedded between the encoder and decoder to discriminate the most relevant visual information, facilitating the alignment of multi-modal information and thereby enhancing the learning of target comment representation. Furthermore, our method follows a multi-task learning scheme to help the model reinforce the representation of the target comments and improve the effectiveness of sentiment analysis. Extensive experiments suggest the effectiveness of the proposed framework in this work. ¹

查看原文本刊更多论文

基于变分残差表示的实时视频评论情感分析

实时视频评论（LVC）对于舆论分析、交流和用户参与都很有价值。分析LVC中的情绪对于理解其内容至关重要，特别是当涉及到强烈的情绪时。然而，与普通文本相比，LVC表现出更强的实时性，以及上下文依赖和跨模态错位。传统的情感分析方法仅依赖于文本信息和明确的上下文，而现有的多模态情感分析模型在区分上下文和对齐多模态信息方面存在不足。为了解决这些挑战，我们提出了一种基于变分自编码器的新型变分残差融合网络，用于lvc的情感分析。特别是，在编码器中引入了一个自动过滤模块，用于过滤掉有用的周围注释，作为目标注释的上下文信息。在编码器和解码器之间嵌入残差融合模块，以区分最相关的视觉信息，促进多模态信息的对齐，从而增强目标注释表示的学习。此外，我们的方法遵循多任务学习方案，以帮助模型加强目标评论的表示，提高情感分析的有效性。大量的实验表明，所提出的框架在这项工作中的有效性。1

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.