在语言、语音和视觉任务中通过人工反馈调整偏好:一项调查

Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu
{"title":"在语言、语音和视觉任务中通过人工反馈调整偏好:一项调查","authors":"Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu","doi":"arxiv-2409.11564","DOIUrl":null,"url":null,"abstract":"Preference tuning is a crucial process for aligning deep generative models\nwith human preferences. This survey offers a thorough overview of recent\nadvancements in preference tuning and the integration of human feedback. The\npaper is organized into three main sections: 1) introduction and preliminaries:\nan introduction to reinforcement learning frameworks, preference tuning tasks,\nmodels, and datasets across various modalities: language, speech, and vision,\nas well as different policy approaches, 2) in-depth examination of each\npreference tuning approach: a detailed analysis of the methods used in\npreference tuning, and 3) applications, discussion, and future directions: an\nexploration of the applications of preference tuning in downstream tasks,\nincluding evaluation methods for different modalities, and an outlook on future\nresearch directions. Our objective is to present the latest methodologies in\npreference tuning and model alignment, enhancing the understanding of this\nfield for researchers and practitioners. We hope to encourage further\nengagement and innovation in this area.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"96 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey\",\"authors\":\"Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu\",\"doi\":\"arxiv-2409.11564\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Preference tuning is a crucial process for aligning deep generative models\\nwith human preferences. This survey offers a thorough overview of recent\\nadvancements in preference tuning and the integration of human feedback. The\\npaper is organized into three main sections: 1) introduction and preliminaries:\\nan introduction to reinforcement learning frameworks, preference tuning tasks,\\nmodels, and datasets across various modalities: language, speech, and vision,\\nas well as different policy approaches, 2) in-depth examination of each\\npreference tuning approach: a detailed analysis of the methods used in\\npreference tuning, and 3) applications, discussion, and future directions: an\\nexploration of the applications of preference tuning in downstream tasks,\\nincluding evaluation methods for different modalities, and an outlook on future\\nresearch directions. Our objective is to present the latest methodologies in\\npreference tuning and model alignment, enhancing the understanding of this\\nfield for researchers and practitioners. We hope to encourage further\\nengagement and innovation in this area.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":\"96 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11564\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

偏好调整是使深度生成模型与人类偏好相一致的关键过程。本调查报告全面概述了偏好调整和人类反馈整合方面的最新进展。本文分为三个主要部分:1)引言和前言:介绍强化学习框架、偏好调优任务、模型和不同模式的数据集:语言、语音和视觉,以及不同的策略方法;2)深入研究每种偏好调优方法:详细分析偏好调优中使用的方法;3)应用、讨论和未来方向:探讨偏好调优在下游任务中的应用,包括不同模式的评估方法,以及对未来研究方向的展望。我们的目标是介绍偏好调整和模型配准方面的最新方法,增强研究人员和从业人员对这一领域的了解。我们希望鼓励这一领域的进一步参与和创新。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey
Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into three main sections: 1) introduction and preliminaries: an introduction to reinforcement learning frameworks, preference tuning tasks, models, and datasets across various modalities: language, speech, and vision, as well as different policy approaches, 2) in-depth examination of each preference tuning approach: a detailed analysis of the methods used in preference tuning, and 3) applications, discussion, and future directions: an exploration of the applications of preference tuning in downstream tasks, including evaluation methods for different modalities, and an outlook on future research directions. Our objective is to present the latest methodologies in preference tuning and model alignment, enhancing the understanding of this field for researchers and practitioners. We hope to encourage further engagement and innovation in this area.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信