{"title":"Evaluating reasoning large language models on rumor generation, detection, and debunking tasks","authors":"Yejinxuan Hu , Xianyun Tian","doi":"10.1016/j.isci.2025.113690","DOIUrl":null,"url":null,"abstract":"<div><div>Reasoning-capable large language models (RLLMs) introduce new challenges for rumor management. While standard LLMs have been studied, the behaviors of RLLMs in rumor generation, detection, and debunking remain underexplored. This study evaluates four open-source RLLMs—DeepSeek-R1, Qwen3-235B-A22B, QwQ-32B, and GLM-Z1-Air—across these tasks under zero-shot, chain-of-thought, and few-shot prompting. Results reveal three key findings. First, RLLMs frequently complied with rumor-generation requests, rationalizing them as harmless tasks, which highlights important safety risks. Second, in rumor detection, they generally underperformed traditional baselines, with accuracy often negatively correlated with output token count. Third, in debunking, RLLM texts achieved partial factual consistency with official sources but also produced contradictions, exhibited poor readability, and displayed highly adaptable emotional tones depending on prompts. These findings highlight both the potential and risks of RLLMs in rumor management, underscoring the need for stronger safety alignment, improved detection, and higher-quality debunking strategies.</div></div>","PeriodicalId":342,"journal":{"name":"iScience","volume":"28 11","pages":"Article 113690"},"PeriodicalIF":4.1000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"iScience","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589004225019510","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Reasoning-capable large language models (RLLMs) introduce new challenges for rumor management. While standard LLMs have been studied, the behaviors of RLLMs in rumor generation, detection, and debunking remain underexplored. This study evaluates four open-source RLLMs—DeepSeek-R1, Qwen3-235B-A22B, QwQ-32B, and GLM-Z1-Air—across these tasks under zero-shot, chain-of-thought, and few-shot prompting. Results reveal three key findings. First, RLLMs frequently complied with rumor-generation requests, rationalizing them as harmless tasks, which highlights important safety risks. Second, in rumor detection, they generally underperformed traditional baselines, with accuracy often negatively correlated with output token count. Third, in debunking, RLLM texts achieved partial factual consistency with official sources but also produced contradictions, exhibited poor readability, and displayed highly adaptable emotional tones depending on prompts. These findings highlight both the potential and risks of RLLMs in rumor management, underscoring the need for stronger safety alignment, improved detection, and higher-quality debunking strategies.
期刊介绍:
Science has many big remaining questions. To address them, we will need to work collaboratively and across disciplines. The goal of iScience is to help fuel that type of interdisciplinary thinking. iScience is a new open-access journal from Cell Press that provides a platform for original research in the life, physical, and earth sciences. The primary criterion for publication in iScience is a significant contribution to a relevant field combined with robust results and underlying methodology. The advances appearing in iScience include both fundamental and applied investigations across this interdisciplinary range of topic areas. To support transparency in scientific investigation, we are happy to consider replication studies and papers that describe negative results.
We know you want your work to be published quickly and to be widely visible within your community and beyond. With the strong international reputation of Cell Press behind it, publication in iScience will help your work garner the attention and recognition it merits. Like all Cell Press journals, iScience prioritizes rapid publication. Our editorial team pays special attention to high-quality author service and to efficient, clear-cut decisions based on the information available within the manuscript. iScience taps into the expertise across Cell Press journals and selected partners to inform our editorial decisions and help publish your science in a timely and seamless way.