RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

IF 28 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

ACM Computing Surveys Pub Date : 2025-06-05 DOI:10.1145/3743127

Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva

引用次数: 0

Abstract

A significant challenge in training large language models (LLMs) as effective assistants is aligning them with human preferences. Reinforcement learning from human feedback (RLHF) has emerged as a promising solution. However, our understanding of RLHF is often limited to initial design choices. This paper analyzes RLHF through reinforcement learning principles, focusing on the reward model. It examines modeling choices and function approximation caveats, highlighting assumptions about reward expressivity and revealing limitations like incorrect generalization, model misspecification, and sparse feedback. A categorical review of current literature provides insights for researchers to understand the challenges of RLHF and build upon existing methods.

查看原文本刊更多论文

破译RLHF: llm从人类反馈中强化学习的关键分析

训练大型语言模型（llm）作为有效助手的一个重大挑战是使它们与人类偏好保持一致。基于人类反馈的强化学习（RLHF）已经成为一种很有前途的解决方案。然而，我们对RLHF的理解往往局限于最初的设计选择。本文通过强化学习原理分析RLHF，重点研究奖励模型。它检查了建模选择和函数近似警告，强调了关于奖励表达性的假设，并揭示了诸如错误泛化、模型错误规范和稀疏反馈等限制。对当前文献的分类综述为研究人员了解RLHF的挑战和建立现有方法提供了见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Computing Surveys 工程技术-计算机：理论方法

CiteScore

33.20

自引率

0.60%

发文量

372

审稿时长

12 months

期刊介绍： ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods. ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.