艾尔沃德：授权语言模型与世界洞察力和人类对齐的奖励设计

IF 2.3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems Pub Date : 2025-05-12 DOI:10.1111/exsy.70055

Yongping Du, Siyuan Li, Rui Yan, Ying Hou, Honggui Han

{"title":"艾尔沃德：授权语言模型与世界洞察力和人类对齐的奖励设计","authors":"Yongping Du, Siyuan Li, Rui Yan, Ying Hou, Honggui Han","doi":"10.1111/exsy.70055","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Large language models (LLMs) have made significant progress in many tasks, but they may also generate biased or misleading outputs. Alignment techniques address this issue by refining models to reflect human values, but high-quality preference datasets are limited. This study introduces a method to train a high-performance reward model (RM) by integrating open knowledge with human feedback. We construct the Open Knowledge and Human Feedback (OK-HF) dataset, comprising 39.8 million open preference data entries and 30,000 human feedback entries. The dual-stage aligning strategy is proposed to combine preference pre-training with domain adaptation, leveraging multi-objective optimization to enhance learning from both preference data and fine-grained human feedback. The Open Knowledge and Human-feedback Reward Model (OKH-RM), designed with the dual-stage aligning strategy on the OK-HF dataset, demonstrates exceptional performance in aligning LLMs with human preferences. The experimental results show that OKH-RM outperforms Llama2-RM, Qwen-RM and Ultra-RM, particularly achieving an accuracy of 85.93% on the Stanford SHP dataset. The model has shown advanced capabilities in detecting low-quality repetitive responses and mitigating biases related to response length.</p>\n </div>","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"42 6","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ELWARD: Empowering Language Model With World Insights and Human-Aligned Reward Design\",\"authors\":\"Yongping Du, Siyuan Li, Rui Yan, Ying Hou, Honggui Han\",\"doi\":\"10.1111/exsy.70055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Large language models (LLMs) have made significant progress in many tasks, but they may also generate biased or misleading outputs. Alignment techniques address this issue by refining models to reflect human values, but high-quality preference datasets are limited. This study introduces a method to train a high-performance reward model (RM) by integrating open knowledge with human feedback. We construct the Open Knowledge and Human Feedback (OK-HF) dataset, comprising 39.8 million open preference data entries and 30,000 human feedback entries. The dual-stage aligning strategy is proposed to combine preference pre-training with domain adaptation, leveraging multi-objective optimization to enhance learning from both preference data and fine-grained human feedback. The Open Knowledge and Human-feedback Reward Model (OKH-RM), designed with the dual-stage aligning strategy on the OK-HF dataset, demonstrates exceptional performance in aligning LLMs with human preferences. The experimental results show that OKH-RM outperforms Llama2-RM, Qwen-RM and Ultra-RM, particularly achieving an accuracy of 85.93% on the Stanford SHP dataset. The model has shown advanced capabilities in detecting low-quality repetitive responses and mitigating biases related to response length.</p>\\n </div>\",\"PeriodicalId\":51053,\"journal\":{\"name\":\"Expert Systems\",\"volume\":\"42 6\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70055\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70055","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）在许多任务中取得了重大进展，但它们也可能产生有偏见或误导性的输出。校准技术通过改进模型来反映人类的价值观来解决这个问题，但是高质量的偏好数据集是有限的。本文介绍了一种将开放知识与人类反馈相结合的方法来训练一个高性能的奖励模型。我们构建了开放知识和人类反馈（OK-HF）数据集，包括3980万个开放偏好数据条目和30,000个人类反馈条目。提出了双阶段对齐策略，将偏好预训练与领域自适应相结合，利用多目标优化来增强对偏好数据和细粒度人类反馈的学习。开放知识和人类反馈奖励模型（OKH-RM）在OK-HF数据集上采用双阶段对齐策略设计，在将llm与人类偏好对齐方面表现出色。实验结果表明，OKH-RM在斯坦福SHP数据集上优于Llama2-RM、Qwen-RM和Ultra-RM，准确率达到85.93%。该模型在检测低质量重复响应和减轻与响应长度相关的偏差方面显示出先进的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ELWARD: Empowering Language Model With World Insights and Human-Aligned Reward Design

Large language models (LLMs) have made significant progress in many tasks, but they may also generate biased or misleading outputs. Alignment techniques address this issue by refining models to reflect human values, but high-quality preference datasets are limited. This study introduces a method to train a high-performance reward model (RM) by integrating open knowledge with human feedback. We construct the Open Knowledge and Human Feedback (OK-HF) dataset, comprising 39.8 million open preference data entries and 30,000 human feedback entries. The dual-stage aligning strategy is proposed to combine preference pre-training with domain adaptation, leveraging multi-objective optimization to enhance learning from both preference data and fine-grained human feedback. The Open Knowledge and Human-feedback Reward Model (OKH-RM), designed with the dual-stage aligning strategy on the OK-HF dataset, demonstrates exceptional performance in aligning LLMs with human preferences. The experimental results show that OKH-RM outperforms Llama2-RM, Qwen-RM and Ultra-RM, particularly achieving an accuracy of 85.93% on the Stanford SHP dataset. The model has shown advanced capabilities in detecting low-quality repetitive responses and mitigating biases related to response length.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems 工程技术-计算机：理论方法

CiteScore

7.40

自引率

6.10%

发文量

266

审稿时长

24 months

期刊介绍： Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper. As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.