Yongping Du, Siyuan Li, Rui Yan, Ying Hou, Honggui Han
{"title":"艾尔沃德:授权语言模型与世界洞察力和人类对齐的奖励设计","authors":"Yongping Du, Siyuan Li, Rui Yan, Ying Hou, Honggui Han","doi":"10.1111/exsy.70055","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Large language models (LLMs) have made significant progress in many tasks, but they may also generate biased or misleading outputs. Alignment techniques address this issue by refining models to reflect human values, but high-quality preference datasets are limited. This study introduces a method to train a high-performance reward model (RM) by integrating open knowledge with human feedback. We construct the Open Knowledge and Human Feedback (OK-HF) dataset, comprising 39.8 million open preference data entries and 30,000 human feedback entries. The dual-stage aligning strategy is proposed to combine preference pre-training with domain adaptation, leveraging multi-objective optimization to enhance learning from both preference data and fine-grained human feedback. The Open Knowledge and Human-feedback Reward Model (OKH-RM), designed with the dual-stage aligning strategy on the OK-HF dataset, demonstrates exceptional performance in aligning LLMs with human preferences. The experimental results show that OKH-RM outperforms Llama2-RM, Qwen-RM and Ultra-RM, particularly achieving an accuracy of 85.93% on the Stanford SHP dataset. The model has shown advanced capabilities in detecting low-quality repetitive responses and mitigating biases related to response length.</p>\n </div>","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"42 6","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ELWARD: Empowering Language Model With World Insights and Human-Aligned Reward Design\",\"authors\":\"Yongping Du, Siyuan Li, Rui Yan, Ying Hou, Honggui Han\",\"doi\":\"10.1111/exsy.70055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Large language models (LLMs) have made significant progress in many tasks, but they may also generate biased or misleading outputs. Alignment techniques address this issue by refining models to reflect human values, but high-quality preference datasets are limited. This study introduces a method to train a high-performance reward model (RM) by integrating open knowledge with human feedback. We construct the Open Knowledge and Human Feedback (OK-HF) dataset, comprising 39.8 million open preference data entries and 30,000 human feedback entries. The dual-stage aligning strategy is proposed to combine preference pre-training with domain adaptation, leveraging multi-objective optimization to enhance learning from both preference data and fine-grained human feedback. The Open Knowledge and Human-feedback Reward Model (OKH-RM), designed with the dual-stage aligning strategy on the OK-HF dataset, demonstrates exceptional performance in aligning LLMs with human preferences. The experimental results show that OKH-RM outperforms Llama2-RM, Qwen-RM and Ultra-RM, particularly achieving an accuracy of 85.93% on the Stanford SHP dataset. The model has shown advanced capabilities in detecting low-quality repetitive responses and mitigating biases related to response length.</p>\\n </div>\",\"PeriodicalId\":51053,\"journal\":{\"name\":\"Expert Systems\",\"volume\":\"42 6\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70055\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70055","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
ELWARD: Empowering Language Model With World Insights and Human-Aligned Reward Design
Large language models (LLMs) have made significant progress in many tasks, but they may also generate biased or misleading outputs. Alignment techniques address this issue by refining models to reflect human values, but high-quality preference datasets are limited. This study introduces a method to train a high-performance reward model (RM) by integrating open knowledge with human feedback. We construct the Open Knowledge and Human Feedback (OK-HF) dataset, comprising 39.8 million open preference data entries and 30,000 human feedback entries. The dual-stage aligning strategy is proposed to combine preference pre-training with domain adaptation, leveraging multi-objective optimization to enhance learning from both preference data and fine-grained human feedback. The Open Knowledge and Human-feedback Reward Model (OKH-RM), designed with the dual-stage aligning strategy on the OK-HF dataset, demonstrates exceptional performance in aligning LLMs with human preferences. The experimental results show that OKH-RM outperforms Llama2-RM, Qwen-RM and Ultra-RM, particularly achieving an accuracy of 85.93% on the Stanford SHP dataset. The model has shown advanced capabilities in detecting low-quality repetitive responses and mitigating biases related to response length.
期刊介绍:
Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper.
As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.