CREST: A causal framework for mitigating shortcut learning in language models through counterfactual reasoning

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-09-30 DOI:10.1016/j.ipm.2025.104418

Zhonghua Liu , Wei Shao , Shaolong E. , Xia Cao

{"title":"CREST: A causal framework for mitigating shortcut learning in language models through counterfactual reasoning","authors":"Zhonghua Liu , Wei Shao , Shaolong E. , Xia Cao","doi":"10.1016/j.ipm.2025.104418","DOIUrl":null,"url":null,"abstract":"<div><div>Language models excel at numerous natural language processing tasks but often exploit surface patterns rather than develop genuine causal understanding. This limitation leads to vulnerability when encountering out-of-distribution examples or adversarial inputs. We present CREST, a framework incorporating causal learning principles into language models to mitigate shortcut learning through counterfactual reasoning. CREST integrates a causally-informed pre-trained question-answering model with a debiasing framework utilizing counterfactual analysis. The framework explicitly models the causal structure of question-answering tasks and employs controlled interventions to differentiate authentic reasoning pathways from shortcuts. Our multi-branch architecture separates robust causal reasoning from potential shortcut pathways, while the counterfactual reasoning component regulates variable interactions during training and inference. Experiments across six datasets (DREAM, SQuAD, TriviaQA, HotpotQA, TyDi-QA, QuAC) demonstrate CREST achieves 2.41–3.15% improvements over eight baselines, with the strongest gains on multi-hop reasoning. Validation on large language models GPT-J (6B) and Llama-2 (7B) using knowledge editing scenarios shows CREST achieving 44.13% and 45.83% success rates, substantially outperforming Fine-Tuning (19.97–22.97%), ROME (13.67–17.66%), MEMIT (14.05–20.13%), and MeLLo (24.97–29.62%). Critically, CREST demonstrates superior hop-wise accuracy of 30.60% on GPT-J and 60.44% on Llama-2, indicating genuine step-by-step reasoning compared to MeLLo’s 0.21–9.90%. CREST exhibits superior adversarial robustness, maintaining 78.4–85.3% performance under attacks compared to 71.2–77.5% for the best baseline. While requiring 30% additional training time, CREST maintains competitive inference speeds with only 3% latency increase, demonstrating practical feasibility for real-world deployment.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104418"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003590","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Language models excel at numerous natural language processing tasks but often exploit surface patterns rather than develop genuine causal understanding. This limitation leads to vulnerability when encountering out-of-distribution examples or adversarial inputs. We present CREST, a framework incorporating causal learning principles into language models to mitigate shortcut learning through counterfactual reasoning. CREST integrates a causally-informed pre-trained question-answering model with a debiasing framework utilizing counterfactual analysis. The framework explicitly models the causal structure of question-answering tasks and employs controlled interventions to differentiate authentic reasoning pathways from shortcuts. Our multi-branch architecture separates robust causal reasoning from potential shortcut pathways, while the counterfactual reasoning component regulates variable interactions during training and inference. Experiments across six datasets (DREAM, SQuAD, TriviaQA, HotpotQA, TyDi-QA, QuAC) demonstrate CREST achieves 2.41–3.15% improvements over eight baselines, with the strongest gains on multi-hop reasoning. Validation on large language models GPT-J (6B) and Llama-2 (7B) using knowledge editing scenarios shows CREST achieving 44.13% and 45.83% success rates, substantially outperforming Fine-Tuning (19.97–22.97%), ROME (13.67–17.66%), MEMIT (14.05–20.13%), and MeLLo (24.97–29.62%). Critically, CREST demonstrates superior hop-wise accuracy of 30.60% on GPT-J and 60.44% on Llama-2, indicating genuine step-by-step reasoning compared to MeLLo’s 0.21–9.90%. CREST exhibits superior adversarial robustness, maintaining 78.4–85.3% performance under attacks compared to 71.2–77.5% for the best baseline. While requiring 30% additional training time, CREST maintains competitive inference speeds with only 3% latency increase, demonstrating practical feasibility for real-world deployment.

查看原文本刊更多论文

通过反事实推理减轻语言模型中捷径学习的因果框架

语言模型在许多自然语言处理任务中表现出色，但往往利用表面模式，而不是发展真正的因果理解。当遇到分布外示例或对抗性输入时，这种限制会导致漏洞。我们提出CREST，一个将因果学习原则纳入语言模型的框架，以减轻通过反事实推理的捷径学习。CREST集成了因果信息预先训练的问答模型与利用反事实分析的去偏见框架。该框架明确地模拟了问答任务的因果结构，并采用受控干预来区分真实推理路径和捷径。我们的多分支架构将稳健的因果推理从潜在的捷径路径中分离出来，而反事实推理组件在训练和推理过程中调节变量交互。在6个数据集（DREAM, SQuAD, TriviaQA, HotpotQA, TyDi-QA, QuAC）上进行的实验表明，CREST在8个基线上实现了2.41-3.15%的改进，其中在多跳推理上的收益最大。使用知识编辑场景对大型语言模型GPT-J （6B）和Llama-2 （7B）进行验证，CREST的成功率分别为44.13%和45.83%，大大优于Fine-Tuning（19.97-22.97%）、ROME（13.67-17.66%）、MEMIT（14.05-20.13%）和MeLLo（24.97-29.62%）。至关重要的是，CREST在GPT-J和Llama-2上的跳向准确率分别为30.60%和60.44%，与MeLLo的0.21-9.90%相比，这表明了真正的逐步推理。CREST表现出优越的对抗稳健性，在攻击下保持78.4-85.3%的性能，而最佳基线为71.2-77.5%。虽然需要30%的额外训练时间，但CREST保持了具有竞争力的推理速度，仅增加了3%的延迟，证明了实际部署的实际可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.