{"title":"CREST: A causal framework for mitigating shortcut learning in language models through counterfactual reasoning","authors":"Zhonghua Liu , Wei Shao , Shaolong E. , Xia Cao","doi":"10.1016/j.ipm.2025.104418","DOIUrl":null,"url":null,"abstract":"<div><div>Language models excel at numerous natural language processing tasks but often exploit surface patterns rather than develop genuine causal understanding. This limitation leads to vulnerability when encountering out-of-distribution examples or adversarial inputs. We present CREST, a framework incorporating causal learning principles into language models to mitigate shortcut learning through counterfactual reasoning. CREST integrates a causally-informed pre-trained question-answering model with a debiasing framework utilizing counterfactual analysis. The framework explicitly models the causal structure of question-answering tasks and employs controlled interventions to differentiate authentic reasoning pathways from shortcuts. Our multi-branch architecture separates robust causal reasoning from potential shortcut pathways, while the counterfactual reasoning component regulates variable interactions during training and inference. Experiments across six datasets (DREAM, SQuAD, TriviaQA, HotpotQA, TyDi-QA, QuAC) demonstrate CREST achieves 2.41–3.15% improvements over eight baselines, with the strongest gains on multi-hop reasoning. Validation on large language models GPT-J (6B) and Llama-2 (7B) using knowledge editing scenarios shows CREST achieving 44.13% and 45.83% success rates, substantially outperforming Fine-Tuning (19.97–22.97%), ROME (13.67–17.66%), MEMIT (14.05–20.13%), and MeLLo (24.97–29.62%). Critically, CREST demonstrates superior hop-wise accuracy of 30.60% on GPT-J and 60.44% on Llama-2, indicating genuine step-by-step reasoning compared to MeLLo’s 0.21–9.90%. CREST exhibits superior adversarial robustness, maintaining 78.4–85.3% performance under attacks compared to 71.2–77.5% for the best baseline. While requiring 30% additional training time, CREST maintains competitive inference speeds with only 3% latency increase, demonstrating practical feasibility for real-world deployment.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104418"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003590","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Language models excel at numerous natural language processing tasks but often exploit surface patterns rather than develop genuine causal understanding. This limitation leads to vulnerability when encountering out-of-distribution examples or adversarial inputs. We present CREST, a framework incorporating causal learning principles into language models to mitigate shortcut learning through counterfactual reasoning. CREST integrates a causally-informed pre-trained question-answering model with a debiasing framework utilizing counterfactual analysis. The framework explicitly models the causal structure of question-answering tasks and employs controlled interventions to differentiate authentic reasoning pathways from shortcuts. Our multi-branch architecture separates robust causal reasoning from potential shortcut pathways, while the counterfactual reasoning component regulates variable interactions during training and inference. Experiments across six datasets (DREAM, SQuAD, TriviaQA, HotpotQA, TyDi-QA, QuAC) demonstrate CREST achieves 2.41–3.15% improvements over eight baselines, with the strongest gains on multi-hop reasoning. Validation on large language models GPT-J (6B) and Llama-2 (7B) using knowledge editing scenarios shows CREST achieving 44.13% and 45.83% success rates, substantially outperforming Fine-Tuning (19.97–22.97%), ROME (13.67–17.66%), MEMIT (14.05–20.13%), and MeLLo (24.97–29.62%). Critically, CREST demonstrates superior hop-wise accuracy of 30.60% on GPT-J and 60.44% on Llama-2, indicating genuine step-by-step reasoning compared to MeLLo’s 0.21–9.90%. CREST exhibits superior adversarial robustness, maintaining 78.4–85.3% performance under attacks compared to 71.2–77.5% for the best baseline. While requiring 30% additional training time, CREST maintains competitive inference speeds with only 3% latency increase, demonstrating practical feasibility for real-world deployment.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.