{"title":"UniTextFusion:使用早期融合和lora调优的语言模型进行阿拉伯语多模态情感分析的低资源框架","authors":"Salma Khaled , Walaa Medhat , Ensaf Hussein Mohamed","doi":"10.1016/j.asej.2025.103682","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal Sentiment Analysis (MuSA) seeks to interpret human emotions by combining textual, auditory, and visual cues. While this field has advanced significantly in English, Arabic MuSA remains underdeveloped due to limited large language models (LLMs), scarce annotated datasets, dialectal variation, and the complexity of fusing multiple modalities. Cultural elements such as sarcasm and emotional nuance are particularly difficult to capture without multimodal context. An early fusion approach, UniTextFusion, is introduced as a means of overcoming these challenges. This fusion strategy transforms audio and visual inputs into descriptive text, allowing seamless integration with Arabic-compatible LLMs. We apply parameter-efficient Low-Rank Adaption (LoRA) fine-tuning to two generative models—LLaMA 3.1-8B Instruct and SILMA AI 9B. Experiments on our Arabic MuSA dataset show that UniTextFusion improves sentiment classification performance by up to 34% in F1-score over strong unimodal and multimodal baselines, reaching 68% with LLaMA and 71% with SILMA. These results validate our hypothesis that modality textualization combined with lightweight fine-tuning is effective for Arabic MuSA and offers a scalable solution for sentiment analysis in low-resource settings.</div></div>","PeriodicalId":48648,"journal":{"name":"Ain Shams Engineering Journal","volume":"16 11","pages":"Article 103682"},"PeriodicalIF":5.9000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UniTextFusion: A low-resource framework for Arabic multimodal sentiment analysis using early fusion and LoRA-tuned language models\",\"authors\":\"Salma Khaled , Walaa Medhat , Ensaf Hussein Mohamed\",\"doi\":\"10.1016/j.asej.2025.103682\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multimodal Sentiment Analysis (MuSA) seeks to interpret human emotions by combining textual, auditory, and visual cues. While this field has advanced significantly in English, Arabic MuSA remains underdeveloped due to limited large language models (LLMs), scarce annotated datasets, dialectal variation, and the complexity of fusing multiple modalities. Cultural elements such as sarcasm and emotional nuance are particularly difficult to capture without multimodal context. An early fusion approach, UniTextFusion, is introduced as a means of overcoming these challenges. This fusion strategy transforms audio and visual inputs into descriptive text, allowing seamless integration with Arabic-compatible LLMs. We apply parameter-efficient Low-Rank Adaption (LoRA) fine-tuning to two generative models—LLaMA 3.1-8B Instruct and SILMA AI 9B. Experiments on our Arabic MuSA dataset show that UniTextFusion improves sentiment classification performance by up to 34% in F1-score over strong unimodal and multimodal baselines, reaching 68% with LLaMA and 71% with SILMA. These results validate our hypothesis that modality textualization combined with lightweight fine-tuning is effective for Arabic MuSA and offers a scalable solution for sentiment analysis in low-resource settings.</div></div>\",\"PeriodicalId\":48648,\"journal\":{\"name\":\"Ain Shams Engineering Journal\",\"volume\":\"16 11\",\"pages\":\"Article 103682\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ain Shams Engineering Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S209044792500423X\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ain Shams Engineering Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S209044792500423X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
多模态情感分析(MuSA)试图通过结合文本、听觉和视觉线索来解释人类的情感。虽然这一领域在英语中取得了显著进展,但由于有限的大型语言模型(llm)、缺乏注释数据集、方言变化以及融合多种模式的复杂性,阿拉伯语的MuSA仍然不发达。如果没有多模态语境,诸如讽刺和情感上的细微差别等文化元素尤其难以捕捉。一种早期的融合方法,UniTextFusion,作为克服这些挑战的一种手段被引入。这种融合策略将音频和视觉输入转换为描述性文本,允许与阿拉伯语兼容的llm无缝集成。我们将参数高效的低秩自适应(LoRA)微调应用于两个生成模型- llama 3.1-8B指令和SILMA AI 9B。在我们的阿拉伯语MuSA数据集上的实验表明,在强大的单峰和多峰基线上,UniTextFusion将情感分类性能的f1得分提高了34%,在LLaMA和SILMA中分别达到68%和71%。这些结果验证了我们的假设,即情态文本化与轻量级微调相结合对阿拉伯语MuSA是有效的,并为低资源环境下的情感分析提供了可扩展的解决方案。
UniTextFusion: A low-resource framework for Arabic multimodal sentiment analysis using early fusion and LoRA-tuned language models
Multimodal Sentiment Analysis (MuSA) seeks to interpret human emotions by combining textual, auditory, and visual cues. While this field has advanced significantly in English, Arabic MuSA remains underdeveloped due to limited large language models (LLMs), scarce annotated datasets, dialectal variation, and the complexity of fusing multiple modalities. Cultural elements such as sarcasm and emotional nuance are particularly difficult to capture without multimodal context. An early fusion approach, UniTextFusion, is introduced as a means of overcoming these challenges. This fusion strategy transforms audio and visual inputs into descriptive text, allowing seamless integration with Arabic-compatible LLMs. We apply parameter-efficient Low-Rank Adaption (LoRA) fine-tuning to two generative models—LLaMA 3.1-8B Instruct and SILMA AI 9B. Experiments on our Arabic MuSA dataset show that UniTextFusion improves sentiment classification performance by up to 34% in F1-score over strong unimodal and multimodal baselines, reaching 68% with LLaMA and 71% with SILMA. These results validate our hypothesis that modality textualization combined with lightweight fine-tuning is effective for Arabic MuSA and offers a scalable solution for sentiment analysis in low-resource settings.
期刊介绍:
in Shams Engineering Journal is an international journal devoted to publication of peer reviewed original high-quality research papers and review papers in both traditional topics and those of emerging science and technology. Areas of both theoretical and fundamental interest as well as those concerning industrial applications, emerging instrumental techniques and those which have some practical application to an aspect of human endeavor, such as the preservation of the environment, health, waste disposal are welcome. The overall focus is on original and rigorous scientific research results which have generic significance.
Ain Shams Engineering Journal focuses upon aspects of mechanical engineering, electrical engineering, civil engineering, chemical engineering, petroleum engineering, environmental engineering, architectural and urban planning engineering. Papers in which knowledge from other disciplines is integrated with engineering are especially welcome like nanotechnology, material sciences, and computational methods as well as applied basic sciences: engineering mathematics, physics and chemistry.