Qianjun Pan , Wenkai Ji , Yuyang Ding, Junsong Li, Shilian Chen, Junyi Wang, Jie Zhou, Qin Chen, Min Zhang, Yulan Wu, Liang He
{"title":"A survey of slow thinking-based reasoning LLMs using reinforcement learning and test-time scaling law","authors":"Qianjun Pan , Wenkai Ji , Yuyang Ding, Junsong Li, Shilian Chen, Junyi Wang, Jie Zhou, Qin Chen, Min Zhang, Yulan Wu, Liang He","doi":"10.1016/j.ipm.2025.104394","DOIUrl":null,"url":null,"abstract":"<div><div>This survey presents a focused and conceptually distinct framework for understanding recent advancements in reasoning large language models (LLMs) designed to emulate “slow thinking”, a deliberate, analytical mode of cognition analogous to System 2 in dual-process theory from cognitive psychology. While prior review works have surveyed reasoning LLMs through fragmented lenses, such as isolated technical paradigms (e.g., reinforcement learning or test-time scaling) or broad post-training taxonomies, this work uniquely integrates reinforcement learning and test-time scaling as synergistic mechanisms within a unified “slow thinking” paradigm. By synthesizing insights from over 200 studies, we identify three interdependent pillars that collectively enable advanced reasoning: (1) Test-time scaling, which dynamically allocates computational resources based on task complexity via search, adaptive computation, and verification; (2) Reinforcement learning, which refines reasoning trajectories through reward modeling, policy optimization, and self-improvement; and (3) Slow-thinking frameworks, which structure reasoning into stepwise, hierarchical, or hybrid processes such as long Chain-of-Thought and multi-agent deliberation. Unlike existing surveys, our framework is goal-oriented, centering on the cognitive objective of “slow thinking” as both a unifying principle and a design imperative. This perspective enables a systematic analysis of how diverse techniques converge toward human-like deep reasoning. The survey charts a trajectory toward next-generation LLMs that balance cognitive fidelity with computational efficiency, while also outlining key challenges and future directions. Advancing such reasoning capabilities is essential for deploying LLMs in high-stakes domains including scientific discovery, autonomous agents, and complex decision support systems.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104394"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003358","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This survey presents a focused and conceptually distinct framework for understanding recent advancements in reasoning large language models (LLMs) designed to emulate “slow thinking”, a deliberate, analytical mode of cognition analogous to System 2 in dual-process theory from cognitive psychology. While prior review works have surveyed reasoning LLMs through fragmented lenses, such as isolated technical paradigms (e.g., reinforcement learning or test-time scaling) or broad post-training taxonomies, this work uniquely integrates reinforcement learning and test-time scaling as synergistic mechanisms within a unified “slow thinking” paradigm. By synthesizing insights from over 200 studies, we identify three interdependent pillars that collectively enable advanced reasoning: (1) Test-time scaling, which dynamically allocates computational resources based on task complexity via search, adaptive computation, and verification; (2) Reinforcement learning, which refines reasoning trajectories through reward modeling, policy optimization, and self-improvement; and (3) Slow-thinking frameworks, which structure reasoning into stepwise, hierarchical, or hybrid processes such as long Chain-of-Thought and multi-agent deliberation. Unlike existing surveys, our framework is goal-oriented, centering on the cognitive objective of “slow thinking” as both a unifying principle and a design imperative. This perspective enables a systematic analysis of how diverse techniques converge toward human-like deep reasoning. The survey charts a trajectory toward next-generation LLMs that balance cognitive fidelity with computational efficiency, while also outlining key challenges and future directions. Advancing such reasoning capabilities is essential for deploying LLMs in high-stakes domains including scientific discovery, autonomous agents, and complex decision support systems.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.