{"title":"跨模态BERT模型在心理社会网络中增强多模态情感分析。","authors":"Jian Feng","doi":"10.1186/s40359-025-03443-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Human emotions in psychological social networks often involve complex interactions across multiple modalities. Information derived from various channels can synergistically complement one another, leading to a more nuanced depiction of an individual's emotional landscape. Multimodal sentiment analysis emerges as a potent tool to process this diverse array of content, facilitating efficient amalgamation of emotions and quantification of emotional intensity.</p><p><strong>Methods: </strong>This paper proposes a cross-modal BERT model and a cross-modal psychological-emotional fusion (CPEF) model for sentiment analysis, integrating visual, audio, and textual modalities. The model initially processes images and audio through dedicated sub-networks for feature extraction and reduction. These features are then passed through the Masked Multimodal Attention (MMA) module, which amalgamates image and audio features via self-attention, yielding a bimodal attention matrix. Subsequently, textual information is fed into the MMA module, undergoing feature extraction through a pre-trained BERT model. The textual information is then fused with the bimodal attention matrix via the pre-trained BERT model, facilitating emotional fusion across modalities.</p><p><strong>Results: </strong>The experimental results on the CMU-MOSEI dataset showcase the effectiveness of the proposed CPEF model, outperforming comparative models, achieving an impressive accuracy rate of 83.9% and F1 Score of 84.1%, notably improving the quantification of negative, neutral, and positive affective energy.</p><p><strong>Conclusions: </strong>Such advancements contribute to the precise detection of mental health status and the cultivation of a positive and sustainable social network environment.</p>","PeriodicalId":37867,"journal":{"name":"BMC Psychology","volume":"13 1","pages":"1081"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482056/pdf/","citationCount":"0","resultStr":"{\"title\":\"Cross-modal BERT model for enhanced multimodal sentiment analysis in psychological social networks.\",\"authors\":\"Jian Feng\",\"doi\":\"10.1186/s40359-025-03443-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Human emotions in psychological social networks often involve complex interactions across multiple modalities. Information derived from various channels can synergistically complement one another, leading to a more nuanced depiction of an individual's emotional landscape. Multimodal sentiment analysis emerges as a potent tool to process this diverse array of content, facilitating efficient amalgamation of emotions and quantification of emotional intensity.</p><p><strong>Methods: </strong>This paper proposes a cross-modal BERT model and a cross-modal psychological-emotional fusion (CPEF) model for sentiment analysis, integrating visual, audio, and textual modalities. The model initially processes images and audio through dedicated sub-networks for feature extraction and reduction. These features are then passed through the Masked Multimodal Attention (MMA) module, which amalgamates image and audio features via self-attention, yielding a bimodal attention matrix. Subsequently, textual information is fed into the MMA module, undergoing feature extraction through a pre-trained BERT model. The textual information is then fused with the bimodal attention matrix via the pre-trained BERT model, facilitating emotional fusion across modalities.</p><p><strong>Results: </strong>The experimental results on the CMU-MOSEI dataset showcase the effectiveness of the proposed CPEF model, outperforming comparative models, achieving an impressive accuracy rate of 83.9% and F1 Score of 84.1%, notably improving the quantification of negative, neutral, and positive affective energy.</p><p><strong>Conclusions: </strong>Such advancements contribute to the precise detection of mental health status and the cultivation of a positive and sustainable social network environment.</p>\",\"PeriodicalId\":37867,\"journal\":{\"name\":\"BMC Psychology\",\"volume\":\"13 1\",\"pages\":\"1081\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482056/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Psychology\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1186/s40359-025-03443-z\",\"RegionNum\":3,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Psychology","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1186/s40359-025-03443-z","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
Cross-modal BERT model for enhanced multimodal sentiment analysis in psychological social networks.
Background: Human emotions in psychological social networks often involve complex interactions across multiple modalities. Information derived from various channels can synergistically complement one another, leading to a more nuanced depiction of an individual's emotional landscape. Multimodal sentiment analysis emerges as a potent tool to process this diverse array of content, facilitating efficient amalgamation of emotions and quantification of emotional intensity.
Methods: This paper proposes a cross-modal BERT model and a cross-modal psychological-emotional fusion (CPEF) model for sentiment analysis, integrating visual, audio, and textual modalities. The model initially processes images and audio through dedicated sub-networks for feature extraction and reduction. These features are then passed through the Masked Multimodal Attention (MMA) module, which amalgamates image and audio features via self-attention, yielding a bimodal attention matrix. Subsequently, textual information is fed into the MMA module, undergoing feature extraction through a pre-trained BERT model. The textual information is then fused with the bimodal attention matrix via the pre-trained BERT model, facilitating emotional fusion across modalities.
Results: The experimental results on the CMU-MOSEI dataset showcase the effectiveness of the proposed CPEF model, outperforming comparative models, achieving an impressive accuracy rate of 83.9% and F1 Score of 84.1%, notably improving the quantification of negative, neutral, and positive affective energy.
Conclusions: Such advancements contribute to the precise detection of mental health status and the cultivation of a positive and sustainable social network environment.
期刊介绍:
BMC Psychology is an open access, peer-reviewed journal that considers manuscripts on all aspects of psychology, human behavior and the mind, including developmental, clinical, cognitive, experimental, health and social psychology, as well as personality and individual differences. The journal welcomes quantitative and qualitative research methods, including animal studies.