解码合成新闻：一个可解释的多模态框架，用于新新闻语料库中的新闻文章分类

IF 13.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2025-07-07 DOI:10.1007/s10462-025-11188-9

Michael Schlee, Gillian Kant, Christoph Ehrling, Benjamin Säfken, Thomas Kneib

{"title":"解码合成新闻：一个可解释的多模态框架，用于新新闻语料库中的新闻文章分类","authors":"Michael Schlee, Gillian Kant, Christoph Ehrling, Benjamin Säfken, Thomas Kneib","doi":"10.1007/s10462-025-11188-9","DOIUrl":null,"url":null,"abstract":"<div><p>Recent advancements in Artificial Intelligence (AI), notably the development of Large Language Models (LLMs) and text-to-image diffusion models, have facilitated the creation of realistic textual content and images. Specifically, platforms like ChatGPT and Midjourney have simplified the creation of high-quality text and visuals with minimal expertise and cost. The increasing sophistication of Generative AI presents challenges in ensuring the integrity of news, media, and information quality, making it increasingly difficult to distinguish between real and artificially generated textual and visual content. Our work addressed this problem in two ways. First, by means of ChatGPT and Midjourney, we created a comprehensive novel multimodal news corpus named <i>SyN24News</i> based on the <i>N24News</i> corpus, on which we evaluated our model. Second, we developed a novel explainable synthetic news detector for discriminating between real and synthetic news articles. We leveraged a Neural Additive Model (NAM)-like network structure that ensures effect separation by handling input data in separate subnetworks. Complex structures and patterns are extracted by deep features from unstructured data, i.e., images and texts, using fine-tuned VGG and DistilBERT subnetworks. We ensured further explainability by individually processing carefully chosen handcrafted text and image features in simple Multilayer Perceptrons (MLPs), allowing for graphical interpretation of corresponding structured effects. Our findings indicate that textual information are the main drivers in the decision-making finding process. Structured textual effects, particularly Flesch-Kincaid reading ease and sentiment, have a much higher influence on the classification outcome than visual features such as dissimilarity and homogeneity.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 10","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11188-9.pdf","citationCount":"0","resultStr":"{\"title\":\"Decoding synthetic news: an interpretable multimodal framework for the classification of news articles in a novel news corpus\",\"authors\":\"Michael Schlee, Gillian Kant, Christoph Ehrling, Benjamin Säfken, Thomas Kneib\",\"doi\":\"10.1007/s10462-025-11188-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Recent advancements in Artificial Intelligence (AI), notably the development of Large Language Models (LLMs) and text-to-image diffusion models, have facilitated the creation of realistic textual content and images. Specifically, platforms like ChatGPT and Midjourney have simplified the creation of high-quality text and visuals with minimal expertise and cost. The increasing sophistication of Generative AI presents challenges in ensuring the integrity of news, media, and information quality, making it increasingly difficult to distinguish between real and artificially generated textual and visual content. Our work addressed this problem in two ways. First, by means of ChatGPT and Midjourney, we created a comprehensive novel multimodal news corpus named <i>SyN24News</i> based on the <i>N24News</i> corpus, on which we evaluated our model. Second, we developed a novel explainable synthetic news detector for discriminating between real and synthetic news articles. We leveraged a Neural Additive Model (NAM)-like network structure that ensures effect separation by handling input data in separate subnetworks. Complex structures and patterns are extracted by deep features from unstructured data, i.e., images and texts, using fine-tuned VGG and DistilBERT subnetworks. We ensured further explainability by individually processing carefully chosen handcrafted text and image features in simple Multilayer Perceptrons (MLPs), allowing for graphical interpretation of corresponding structured effects. Our findings indicate that textual information are the main drivers in the decision-making finding process. Structured textual effects, particularly Flesch-Kincaid reading ease and sentiment, have a much higher influence on the classification outcome than visual features such as dissimilarity and homogeneity.</p></div>\",\"PeriodicalId\":8449,\"journal\":{\"name\":\"Artificial Intelligence Review\",\"volume\":\"58 10\",\"pages\":\"\"},\"PeriodicalIF\":13.9000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10462-025-11188-9.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10462-025-11188-9\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11188-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

人工智能（AI）的最新进展，特别是大型语言模型（llm）和文本到图像扩散模型的发展，促进了逼真文本内容和图像的创建。具体来说，像ChatGPT和Midjourney这样的平台以最少的专业知识和成本简化了高质量文本和视觉效果的创建。生成式人工智能的日益成熟在确保新闻、媒体和信息质量的完整性方面提出了挑战，使得区分真实和人工生成的文本和视觉内容变得越来越困难。我们的工作从两个方面解决了这个问题。首先，利用ChatGPT和Midjourney，我们在N24News语料库的基础上创建了一个综合性的新型多模态新闻语料库SyN24News，并对模型进行了评估。其次，我们开发了一种新的可解释的合成新闻检测器，用于区分真实和合成的新闻文章。我们利用类似神经相加模型（NAM）的网络结构，通过在单独的子网络中处理输入数据来确保效果分离。复杂的结构和模式通过深度特征从非结构化数据中提取，即图像和文本，使用微调的VGG和蒸馏伯特子网。我们通过在简单的多层感知器（mlp）中单独处理精心挑选的手工文本和图像特征来确保进一步的可解释性，从而允许对相应的结构化效果进行图形化解释。我们的研究结果表明，文本信息是决策发现过程中的主要驱动因素。结构化文本效果，特别是flesch - kinkaid阅读轻松度和情感，对分类结果的影响远高于视觉特征（如不相似性和同质性）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Decoding synthetic news: an interpretable multimodal framework for the classification of news articles in a novel news corpus

Recent advancements in Artificial Intelligence (AI), notably the development of Large Language Models (LLMs) and text-to-image diffusion models, have facilitated the creation of realistic textual content and images. Specifically, platforms like ChatGPT and Midjourney have simplified the creation of high-quality text and visuals with minimal expertise and cost. The increasing sophistication of Generative AI presents challenges in ensuring the integrity of news, media, and information quality, making it increasingly difficult to distinguish between real and artificially generated textual and visual content. Our work addressed this problem in two ways. First, by means of ChatGPT and Midjourney, we created a comprehensive novel multimodal news corpus named SyN24News based on the N24News corpus, on which we evaluated our model. Second, we developed a novel explainable synthetic news detector for discriminating between real and synthetic news articles. We leveraged a Neural Additive Model (NAM)-like network structure that ensures effect separation by handling input data in separate subnetworks. Complex structures and patterns are extracted by deep features from unstructured data, i.e., images and texts, using fine-tuned VGG and DistilBERT subnetworks. We ensured further explainability by individually processing carefully chosen handcrafted text and image features in simple Multilayer Perceptrons (MLPs), allowing for graphical interpretation of corresponding structured effects. Our findings indicate that textual information are the main drivers in the decision-making finding process. Structured textual effects, particularly Flesch-Kincaid reading ease and sentiment, have a much higher influence on the classification outcome than visual features such as dissimilarity and homogeneity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.