{"title":"多任务学习和互信息最大化与跨模态变换器用于多模态情感分析","authors":"Yang Shi, Jinglang Cai, Lei Liao","doi":"10.1007/s10844-024-00858-9","DOIUrl":null,"url":null,"abstract":"<p>The effectiveness of multimodal sentiment analysis hinges on the seamless integration of information from diverse modalities, where the quality of modality fusion directly influences sentiment analysis accuracy. Prior methods often rely on intricate fusion strategies, elevating computational costs and potentially yielding inaccurate multimodal representations due to distribution gaps and information redundancy across heterogeneous modalities. This paper centers on the backpropagation of loss and introduces a Transformer-based model called Multi-Task Learning and Mutual Information Maximization with Crossmodal Transformer (MMMT). Addressing the issue of inaccurate multimodal representation for MSA, MMMT effectively combines mutual information maximization with crossmodal Transformer to convey more modality-invariant information to multimodal representation, fully exploring modal commonalities. Notably, it utilizes multi-modal labels for uni-modal training, presenting a fresh perspective on multi-task learning in MSA. Comparative experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate that MMMT improves model accuracy while reducing computational burden, making it suitable for resource-constrained and real-time performance-requiring application scenarios. Additionally, ablation experiments validate the efficacy of multi-task learning and probe the specific impact of combining mutual information maximization with Transformer in MSA.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"16 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-task learning and mutual information maximization with crossmodal transformer for multimodal sentiment analysis\",\"authors\":\"Yang Shi, Jinglang Cai, Lei Liao\",\"doi\":\"10.1007/s10844-024-00858-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The effectiveness of multimodal sentiment analysis hinges on the seamless integration of information from diverse modalities, where the quality of modality fusion directly influences sentiment analysis accuracy. Prior methods often rely on intricate fusion strategies, elevating computational costs and potentially yielding inaccurate multimodal representations due to distribution gaps and information redundancy across heterogeneous modalities. This paper centers on the backpropagation of loss and introduces a Transformer-based model called Multi-Task Learning and Mutual Information Maximization with Crossmodal Transformer (MMMT). Addressing the issue of inaccurate multimodal representation for MSA, MMMT effectively combines mutual information maximization with crossmodal Transformer to convey more modality-invariant information to multimodal representation, fully exploring modal commonalities. Notably, it utilizes multi-modal labels for uni-modal training, presenting a fresh perspective on multi-task learning in MSA. Comparative experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate that MMMT improves model accuracy while reducing computational burden, making it suitable for resource-constrained and real-time performance-requiring application scenarios. Additionally, ablation experiments validate the efficacy of multi-task learning and probe the specific impact of combining mutual information maximization with Transformer in MSA.</p>\",\"PeriodicalId\":56119,\"journal\":{\"name\":\"Journal of Intelligent Information Systems\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Intelligent Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10844-024-00858-9\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10844-024-00858-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Multi-task learning and mutual information maximization with crossmodal transformer for multimodal sentiment analysis
The effectiveness of multimodal sentiment analysis hinges on the seamless integration of information from diverse modalities, where the quality of modality fusion directly influences sentiment analysis accuracy. Prior methods often rely on intricate fusion strategies, elevating computational costs and potentially yielding inaccurate multimodal representations due to distribution gaps and information redundancy across heterogeneous modalities. This paper centers on the backpropagation of loss and introduces a Transformer-based model called Multi-Task Learning and Mutual Information Maximization with Crossmodal Transformer (MMMT). Addressing the issue of inaccurate multimodal representation for MSA, MMMT effectively combines mutual information maximization with crossmodal Transformer to convey more modality-invariant information to multimodal representation, fully exploring modal commonalities. Notably, it utilizes multi-modal labels for uni-modal training, presenting a fresh perspective on multi-task learning in MSA. Comparative experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate that MMMT improves model accuracy while reducing computational burden, making it suitable for resource-constrained and real-time performance-requiring application scenarios. Additionally, ablation experiments validate the efficacy of multi-task learning and probe the specific impact of combining mutual information maximization with Transformer in MSA.
期刊介绍:
The mission of the Journal of Intelligent Information Systems: Integrating Artifical Intelligence and Database Technologies is to foster and present research and development results focused on the integration of artificial intelligence and database technologies to create next generation information systems - Intelligent Information Systems.
These new information systems embody knowledge that allows them to exhibit intelligent behavior, cooperate with users and other systems in problem solving, discovery, access, retrieval and manipulation of a wide variety of multimedia data and knowledge, and reason under uncertainty. Increasingly, knowledge-directed inference processes are being used to:
discover knowledge from large data collections,
provide cooperative support to users in complex query formulation and refinement,
access, retrieve, store and manage large collections of multimedia data and knowledge,
integrate information from multiple heterogeneous data and knowledge sources, and
reason about information under uncertain conditions.
Multimedia and hypermedia information systems now operate on a global scale over the Internet, and new tools and techniques are needed to manage these dynamic and evolving information spaces.
The Journal of Intelligent Information Systems provides a forum wherein academics, researchers and practitioners may publish high-quality, original and state-of-the-art papers describing theoretical aspects, systems architectures, analysis and design tools and techniques, and implementation experiences in intelligent information systems. The categories of papers published by JIIS include: research papers, invited papters, meetings, workshop and conference annoucements and reports, survey and tutorial articles, and book reviews. Short articles describing open problems or their solutions are also welcome.