{"title":"MDCNN:一种多模态双cnn递归模型,用于基于音频和文本的语音情感识别的假新闻检测","authors":"Hongchen Wu, Hongxuan Li, Xiaochang Fang, Mengqi Tang, Hongzhu Yu, Bing Yu, Meng Li, Zhaorong Jing, Yihong Meng, Wei Chen, Yu Liu, Chenfei Sun, Shuang Gao, Huaxiang Zhang","doi":"10.1016/j.specom.2025.103313","DOIUrl":null,"url":null,"abstract":"<div><div>The increasing complexity and diversity of emotional expression pose challenges when identifying fake news conveyed through text and audio formats. Integrating emotional cues derived from data offers a promising approach for balancing the tradeoff between the volume and quality of data. Leveraging recent advancements in speech emotion recognition (SER), our study proposes a Multimodal Recursive Dual-Convolutional Neural Network Model (MDCNN) for fake news detection, with a focus on sentiment analysis based on audio and text. Our proposed model employs convolutional layers to extract features from both audio and text inputs, facilitating an effective feature fusion process for sentiment classification. Through a deep bidirectional recursive encoder, the model can better understand audio and text features for determining the final emotional category. Experiments conducted on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset, which contains 5531 samples across four emotion types—anger, happiness, neutrality, and sadness—demonstrate the superior performance of the MDCNN. Its weighted average precision (WAP) is 78.8 %, which is 2.5 % higher than that of the best baseline. Compared with the existing sentiment analysis models, our approach exhibits notable enhancements in terms of accurately detecting neutral categories, thereby addressing a common challenge faced by the prior models. These findings underscore the efficacy of the MDCNN in multimodal sentiment analysis tasks and its significant achievements in neutral category classification tasks, offering a robust solution for precisely detecting fake news and conducting nuanced emotional analyses in speech recognition scenarios.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103313"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MDCNN: A multimodal dual-CNN recursive model for fake news detection via audio- and text-based speech emotion recognition\",\"authors\":\"Hongchen Wu, Hongxuan Li, Xiaochang Fang, Mengqi Tang, Hongzhu Yu, Bing Yu, Meng Li, Zhaorong Jing, Yihong Meng, Wei Chen, Yu Liu, Chenfei Sun, Shuang Gao, Huaxiang Zhang\",\"doi\":\"10.1016/j.specom.2025.103313\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The increasing complexity and diversity of emotional expression pose challenges when identifying fake news conveyed through text and audio formats. Integrating emotional cues derived from data offers a promising approach for balancing the tradeoff between the volume and quality of data. Leveraging recent advancements in speech emotion recognition (SER), our study proposes a Multimodal Recursive Dual-Convolutional Neural Network Model (MDCNN) for fake news detection, with a focus on sentiment analysis based on audio and text. Our proposed model employs convolutional layers to extract features from both audio and text inputs, facilitating an effective feature fusion process for sentiment classification. Through a deep bidirectional recursive encoder, the model can better understand audio and text features for determining the final emotional category. Experiments conducted on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset, which contains 5531 samples across four emotion types—anger, happiness, neutrality, and sadness—demonstrate the superior performance of the MDCNN. Its weighted average precision (WAP) is 78.8 %, which is 2.5 % higher than that of the best baseline. Compared with the existing sentiment analysis models, our approach exhibits notable enhancements in terms of accurately detecting neutral categories, thereby addressing a common challenge faced by the prior models. These findings underscore the efficacy of the MDCNN in multimodal sentiment analysis tasks and its significant achievements in neutral category classification tasks, offering a robust solution for precisely detecting fake news and conducting nuanced emotional analyses in speech recognition scenarios.</div></div>\",\"PeriodicalId\":49485,\"journal\":{\"name\":\"Speech Communication\",\"volume\":\"175 \",\"pages\":\"Article 103313\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Speech Communication\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167639325001281\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639325001281","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
MDCNN: A multimodal dual-CNN recursive model for fake news detection via audio- and text-based speech emotion recognition
The increasing complexity and diversity of emotional expression pose challenges when identifying fake news conveyed through text and audio formats. Integrating emotional cues derived from data offers a promising approach for balancing the tradeoff between the volume and quality of data. Leveraging recent advancements in speech emotion recognition (SER), our study proposes a Multimodal Recursive Dual-Convolutional Neural Network Model (MDCNN) for fake news detection, with a focus on sentiment analysis based on audio and text. Our proposed model employs convolutional layers to extract features from both audio and text inputs, facilitating an effective feature fusion process for sentiment classification. Through a deep bidirectional recursive encoder, the model can better understand audio and text features for determining the final emotional category. Experiments conducted on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset, which contains 5531 samples across four emotion types—anger, happiness, neutrality, and sadness—demonstrate the superior performance of the MDCNN. Its weighted average precision (WAP) is 78.8 %, which is 2.5 % higher than that of the best baseline. Compared with the existing sentiment analysis models, our approach exhibits notable enhancements in terms of accurately detecting neutral categories, thereby addressing a common challenge faced by the prior models. These findings underscore the efficacy of the MDCNN in multimodal sentiment analysis tasks and its significant achievements in neutral category classification tasks, offering a robust solution for precisely detecting fake news and conducting nuanced emotional analyses in speech recognition scenarios.
期刊介绍:
Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results.
The journal''s primary objectives are:
• to present a forum for the advancement of human and human-machine speech communication science;
• to stimulate cross-fertilization between different fields of this domain;
• to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.