An explainable framework for assisting the detection of AI-generated textual content

IF 6.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Decision Support Systems Pub Date : 2025-06-28 DOI:10.1016/j.dss.2025.114498

Sen Yan, Zhiyi Wang, David Dobolyi

{"title":"An explainable framework for assisting the detection of AI-generated textual content","authors":"Sen Yan, Zhiyi Wang, David Dobolyi","doi":"10.1016/j.dss.2025.114498","DOIUrl":null,"url":null,"abstract":"<div><div>The recent development of generative AI (GenAI) algorithms has allowed machines to create new content in a realistic way, driving the spread of AI-generated content (AIGC) on the Internet. However, generative AI models and AIGC have exacerbated several societal challenges such as security threats (e.g., misinformation), trust issues, ethical concerns, and intellectual property regulation, calling for effective detection methods and a better understanding of AI-generated vs. human-written content. In this paper, we focus on AI-generated texts produced by large language models (LLMs) and extend prior detection methods by proposing a novel framework that combines semantic information and linguistic features. Based on potential semantic and linguistic differences in AI vs. human writing, we design our Semantic-Linguistic-Detector (SemLinDetector) framework by integrating a transformer-based semantic encoder and a linguistic encoder with parallel linguistic representations. By comparing a series of benchmark models on datasets collected from various LLMs and human writers in multiple domains, our experiments show that the proposed detection framework outperforms other benchmarks in a consistent and robust manner. Moreover, our model interpretability analysis showcases our framework's potential to help understand the reasoning behind prediction outcomes and identify patterns of differences in AI-generated and human-written content. Our research adds to the growing space of GenAI by proposing an effective and responsible detection system to address the risks and challenges of GenAI, offering implications for researchers and practitioners to better understand and regulate AIGC.</div></div>","PeriodicalId":55181,"journal":{"name":"Decision Support Systems","volume":"196 ","pages":"Article 114498"},"PeriodicalIF":6.8000,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Support Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167923625000995","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The recent development of generative AI (GenAI) algorithms has allowed machines to create new content in a realistic way, driving the spread of AI-generated content (AIGC) on the Internet. However, generative AI models and AIGC have exacerbated several societal challenges such as security threats (e.g., misinformation), trust issues, ethical concerns, and intellectual property regulation, calling for effective detection methods and a better understanding of AI-generated vs. human-written content. In this paper, we focus on AI-generated texts produced by large language models (LLMs) and extend prior detection methods by proposing a novel framework that combines semantic information and linguistic features. Based on potential semantic and linguistic differences in AI vs. human writing, we design our Semantic-Linguistic-Detector (SemLinDetector) framework by integrating a transformer-based semantic encoder and a linguistic encoder with parallel linguistic representations. By comparing a series of benchmark models on datasets collected from various LLMs and human writers in multiple domains, our experiments show that the proposed detection framework outperforms other benchmarks in a consistent and robust manner. Moreover, our model interpretability analysis showcases our framework's potential to help understand the reasoning behind prediction outcomes and identify patterns of differences in AI-generated and human-written content. Our research adds to the growing space of GenAI by proposing an effective and responsible detection system to address the risks and challenges of GenAI, offering implications for researchers and practitioners to better understand and regulate AIGC.

查看原文本刊更多论文

一个可解释的框架，用于协助检测人工智能生成的文本内容

最近，生成式人工智能（GenAI）算法的发展使机器能够以逼真的方式创造新内容，从而推动了人工智能生成内容（AIGC）在互联网上的传播。然而，生成式人工智能模型和AIGC加剧了一些社会挑战，如安全威胁（例如，错误信息）、信任问题、道德问题和知识产权监管，这需要有效的检测方法，并更好地理解人工智能生成的内容与人类编写的内容。在本文中，我们将重点放在由大型语言模型（llm）生成的人工智能生成文本上，并通过提出一个结合语义信息和语言特征的新框架来扩展先验检测方法。基于人工智能与人类写作中潜在的语义和语言差异，我们通过集成基于转换器的语义编码器和具有并行语言表示的语言编码器来设计语义-语言-检测器（SemLinDetector）框架。通过比较从多个领域的各种法学硕士和人类作家收集的数据集上的一系列基准模型，我们的实验表明，所提出的检测框架以一致和稳健的方式优于其他基准。此外，我们的模型可解释性分析展示了我们的框架的潜力，可以帮助理解预测结果背后的原因，并识别人工智能生成和人工编写内容的差异模式。本研究提出了一种有效的、负责任的检测系统来应对GenAI的风险和挑战，为研究人员和从业人员更好地理解和监管AIGC提供了启示，为GenAI的发展提供了空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Decision Support Systems 工程技术-计算机：人工智能

CiteScore

14.70

自引率

6.70%

发文量

119

审稿时长

13 months

期刊介绍： The common thread of articles published in Decision Support Systems is their relevance to theoretical and technical issues in the support of enhanced decision making. The areas addressed may include foundations, functionality, interfaces, implementation, impacts, and evaluation of decision support systems (DSSs).