Ao Li , Longwei Xu , Chen Ling , Jinghui Zhang , Pengwei Wang
{"title":"EmoVerse: Enhancing Multimodal Large Language Models for Affective Computing via Multitask Learning","authors":"Ao Li , Longwei Xu , Chen Ling , Jinghui Zhang , Pengwei Wang","doi":"10.1016/j.neucom.2025.130810","DOIUrl":null,"url":null,"abstract":"<div><div>Affective computing is essential for applications such as human–computer interaction. While Multimodal Large Language Models (MLLMs) demonstrate impressive general capabilities, they face considerable challenges in affective computing, particularly in detecting subtle facial expressions and handling complex affective tasks, such as emotion reason inference and understanding emotions in multimodal long-context scenarios. Furthermore, there is a lack of a unified MLLM that can effectively handle a wide range of affective tasks. To address these challenges, we propose <strong>Emotion Universe</strong> (EmoVerse), a MLLM trained through a <strong>M</strong>ultistage <strong>M</strong>ultitask <strong>S</strong>entiment and <strong>E</strong>motion (M2SE) instruction tuning strategy. Through this training strategy, EmoVerse acquires the ability to deeply analyze the underlying reasons for affective states. Besides, to address the lack of multitask datasets in affective computing, we introduce the <strong>Affective Multitask</strong> (AMT) Dataset, which supports multimodal sentiment analysis, multimodal emotion recognition, facial expression recognition, emotion reason inference, and emotion cause-pair extraction tasks. Extensive experiments demonstrate that EmoVerse outperforms existing methods, achieving state-of-the-art results in affective tasks. The code is available at <span><span>https://github.com/liaolea/EmoVerse</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"650 ","pages":"Article 130810"},"PeriodicalIF":6.5000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225014821","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Affective computing is essential for applications such as human–computer interaction. While Multimodal Large Language Models (MLLMs) demonstrate impressive general capabilities, they face considerable challenges in affective computing, particularly in detecting subtle facial expressions and handling complex affective tasks, such as emotion reason inference and understanding emotions in multimodal long-context scenarios. Furthermore, there is a lack of a unified MLLM that can effectively handle a wide range of affective tasks. To address these challenges, we propose Emotion Universe (EmoVerse), a MLLM trained through a Multistage Multitask Sentiment and Emotion (M2SE) instruction tuning strategy. Through this training strategy, EmoVerse acquires the ability to deeply analyze the underlying reasons for affective states. Besides, to address the lack of multitask datasets in affective computing, we introduce the Affective Multitask (AMT) Dataset, which supports multimodal sentiment analysis, multimodal emotion recognition, facial expression recognition, emotion reason inference, and emotion cause-pair extraction tasks. Extensive experiments demonstrate that EmoVerse outperforms existing methods, achieving state-of-the-art results in affective tasks. The code is available at https://github.com/liaolea/EmoVerse.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.