Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu
{"title":"Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey","authors":"Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu","doi":"arxiv-2409.11564","DOIUrl":null,"url":null,"abstract":"Preference tuning is a crucial process for aligning deep generative models\nwith human preferences. This survey offers a thorough overview of recent\nadvancements in preference tuning and the integration of human feedback. The\npaper is organized into three main sections: 1) introduction and preliminaries:\nan introduction to reinforcement learning frameworks, preference tuning tasks,\nmodels, and datasets across various modalities: language, speech, and vision,\nas well as different policy approaches, 2) in-depth examination of each\npreference tuning approach: a detailed analysis of the methods used in\npreference tuning, and 3) applications, discussion, and future directions: an\nexploration of the applications of preference tuning in downstream tasks,\nincluding evaluation methods for different modalities, and an outlook on future\nresearch directions. Our objective is to present the latest methodologies in\npreference tuning and model alignment, enhancing the understanding of this\nfield for researchers and practitioners. We hope to encourage further\nengagement and innovation in this area.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"96 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Preference tuning is a crucial process for aligning deep generative models
with human preferences. This survey offers a thorough overview of recent
advancements in preference tuning and the integration of human feedback. The
paper is organized into three main sections: 1) introduction and preliminaries:
an introduction to reinforcement learning frameworks, preference tuning tasks,
models, and datasets across various modalities: language, speech, and vision,
as well as different policy approaches, 2) in-depth examination of each
preference tuning approach: a detailed analysis of the methods used in
preference tuning, and 3) applications, discussion, and future directions: an
exploration of the applications of preference tuning in downstream tasks,
including evaluation methods for different modalities, and an outlook on future
research directions. Our objective is to present the latest methodologies in
preference tuning and model alignment, enhancing the understanding of this
field for researchers and practitioners. We hope to encourage further
engagement and innovation in this area.