韩文情感分析的多模态特征学习方法

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI:10.1109/TMM.2024.3521707

Tae-Young Kim;Jufeng Yang;Eunil Park

{"title":"韩文情感分析的多模态特征学习方法","authors":"Tae-Young Kim;Jufeng Yang;Eunil Park","doi":"10.1109/TMM.2024.3521707","DOIUrl":null,"url":null,"abstract":"Recently, sentiment analysis research has made significant improvements in addressing sentiment and subjectivity within textual content. The advent of multimodal deep learning techniques has further broadened this scope, enabling the integration of diverse modalities such as voice and image features alongside text. However, despite these advancements, the analysis of the Korean language remains challenging due to its inherently agglutinative nature and linguistic ambiguity, primarily examined at the sentence level. To effectively address this challenge, we propose a novel Multimodal Sentimental Deep Learning Framework for Korean (MSDLF-K), which can examine not only Korean text but also its associated speech. Our framework, MSDLF-K, integrates spectrograms and waveforms from Korean voice data with embedding vectors derived from script sentences, creating a unified multimodal representation. This approach facilitates the identification of both shared and unique features within the latent space, thereby offering valuable insights into their respective impacts on sentiment analysis performance. To validate the efficacy of MSDLF-K, we conducted a set of experiments using the emotion speech synthesis dataset. Our findings demonstrate that MSDLF-K achieves a remarkable accuracy of 79.0% in valence and 81.7% in arousal for emotion classification, metrics previously unexplored in the literature. Furthermore, empirical analysis reveals the significant influence of multimodal representations, encompassing both text and voice, on enhancing emotion analysis performance. In summary, our study not only presents a pioneering solution for sentiment analysis in the Korean language but also underscores the importance of incorporating multimodal approaches for more comprehensive and accurate sentiment analysis across diverse linguistic contexts.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1266-1276"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MSDLF-K: A Multimodal Feature Learning Approach for Sentiment Analysis in Korean Incorporating Text and Speech\",\"authors\":\"Tae-Young Kim;Jufeng Yang;Eunil Park\",\"doi\":\"10.1109/TMM.2024.3521707\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, sentiment analysis research has made significant improvements in addressing sentiment and subjectivity within textual content. The advent of multimodal deep learning techniques has further broadened this scope, enabling the integration of diverse modalities such as voice and image features alongside text. However, despite these advancements, the analysis of the Korean language remains challenging due to its inherently agglutinative nature and linguistic ambiguity, primarily examined at the sentence level. To effectively address this challenge, we propose a novel Multimodal Sentimental Deep Learning Framework for Korean (MSDLF-K), which can examine not only Korean text but also its associated speech. Our framework, MSDLF-K, integrates spectrograms and waveforms from Korean voice data with embedding vectors derived from script sentences, creating a unified multimodal representation. This approach facilitates the identification of both shared and unique features within the latent space, thereby offering valuable insights into their respective impacts on sentiment analysis performance. To validate the efficacy of MSDLF-K, we conducted a set of experiments using the emotion speech synthesis dataset. Our findings demonstrate that MSDLF-K achieves a remarkable accuracy of 79.0% in valence and 81.7% in arousal for emotion classification, metrics previously unexplored in the literature. Furthermore, empirical analysis reveals the significant influence of multimodal representations, encompassing both text and voice, on enhancing emotion analysis performance. In summary, our study not only presents a pioneering solution for sentiment analysis in the Korean language but also underscores the importance of incorporating multimodal approaches for more comprehensive and accurate sentiment analysis across diverse linguistic contexts.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"1266-1276\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10814984/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814984/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

近年来，情感分析研究在处理文本内容中的情感和主观性方面取得了重大进展。多模态深度学习技术的出现进一步扩大了这一范围，使语音和图像特征与文本等多种模式的集成成为可能。然而，尽管取得了这些进步，由于其固有的粘连性和语言模糊性，韩语的分析仍然具有挑战性，主要是在句子层面进行检查。为了有效地应对这一挑战，我们提出了一种新的韩国语多模态情感深度学习框架（MSDLF-K），它不仅可以检查韩国语文本，还可以检查其相关语音。我们的框架MSDLF-K将韩语语音数据的频谱图和波形与来自脚本句子的嵌入向量集成在一起，创建了统一的多模态表示。这种方法有助于识别潜在空间内的共享和独特特征，从而为它们各自对情感分析性能的影响提供有价值的见解。为了验证MSDLF-K的有效性，我们使用情绪语音合成数据集进行了一组实验。我们的研究结果表明，MSDLF-K在情绪分类的效价和唤醒方面达到了79.0%的显着准确性，这是以前文献中未探索的指标。此外，实证分析还揭示了包括文本和语音在内的多模态表征对提高情感分析绩效的显著影响。综上所述，我们的研究不仅为韩语情感分析提供了一个开创性的解决方案，而且强调了将多模态方法结合起来，在不同的语言语境中进行更全面、更准确的情感分析的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MSDLF-K: A Multimodal Feature Learning Approach for Sentiment Analysis in Korean Incorporating Text and Speech

Recently, sentiment analysis research has made significant improvements in addressing sentiment and subjectivity within textual content. The advent of multimodal deep learning techniques has further broadened this scope, enabling the integration of diverse modalities such as voice and image features alongside text. However, despite these advancements, the analysis of the Korean language remains challenging due to its inherently agglutinative nature and linguistic ambiguity, primarily examined at the sentence level. To effectively address this challenge, we propose a novel Multimodal Sentimental Deep Learning Framework for Korean (MSDLF-K), which can examine not only Korean text but also its associated speech. Our framework, MSDLF-K, integrates spectrograms and waveforms from Korean voice data with embedding vectors derived from script sentences, creating a unified multimodal representation. This approach facilitates the identification of both shared and unique features within the latent space, thereby offering valuable insights into their respective impacts on sentiment analysis performance. To validate the efficacy of MSDLF-K, we conducted a set of experiments using the emotion speech synthesis dataset. Our findings demonstrate that MSDLF-K achieves a remarkable accuracy of 79.0% in valence and 81.7% in arousal for emotion classification, metrics previously unexplored in the literature. Furthermore, empirical analysis reveals the significant influence of multimodal representations, encompassing both text and voice, on enhancing emotion analysis performance. In summary, our study not only presents a pioneering solution for sentiment analysis in the Korean language but also underscores the importance of incorporating multimodal approaches for more comprehensive and accurate sentiment analysis across diverse linguistic contexts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.