Biomedical Visual Instruction Tuning with Clinician Preference Alignment.

Advances in neural information processing systems Pub Date : 2024-12-01

Hejie Cui, Lingjun Mao, Xin Liang, Jieyu Zhang, Hui Ren, Quanzheng Li, Xiang Li, Carl Yang

{"title":"Biomedical Visual Instruction Tuning with Clinician Preference Alignment.","authors":"Hejie Cui, Lingjun Mao, Xin Liang, Jieyu Zhang, Hui Ren, Quanzheng Li, Xiang Li, Carl Yang","doi":"","DOIUrl":null,"url":null,"abstract":"Recent advancements in multimodal foundation models have showcased impressive capabilities in understanding and reasoning with visual and textual information. Adapting these foundation models trained for general usage to specialized domains like biomedicine requires large-scale domain-specific instruction datasets. While existing works have explored curating such datasets automatically, the resultant datasets are not explicitly aligned with domain expertise. In this work, we propose a data-centric framework, Biomedical Visual Instruction Tuning with Clinician Preference Alignment (BioMed-VITAL), that incorporates clinician preferences into both stages of generating and selecting instruction data for tuning biomedical multimodal foundation models. First, during the generation stage, we prompt the GPT-4V generator with a diverse set of clinician-selected demonstrations for preference-aligned data candidate generation. Then, during the selection phase, we train a separate selection model, which explicitly distills clinician and policy-guided model preferences into a rating function to select high-quality data for medical instruction tuning. Results show that the model tuned with the instruction-following data from our method demonstrates a significant improvement in open visual chat (18.5% relatively) and medical VQA (win rate up to 81.73%). Our instruction-following data and models are available at https://BioMed-VITAL.github.io.","PeriodicalId":72099,"journal":{"name":"Advances in neural information processing systems","volume":"37 ","pages":"96449-96467"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11867732/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in neural information processing systems","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advancements in multimodal foundation models have showcased impressive capabilities in understanding and reasoning with visual and textual information. Adapting these foundation models trained for general usage to specialized domains like biomedicine requires large-scale domain-specific instruction datasets. While existing works have explored curating such datasets automatically, the resultant datasets are not explicitly aligned with domain expertise. In this work, we propose a data-centric framework, Biomedical Visual Instruction Tuning with Clinician Preference Alignment (BioMed-VITAL), that incorporates clinician preferences into both stages of generating and selecting instruction data for tuning biomedical multimodal foundation models. First, during the generation stage, we prompt the GPT-4V generator with a diverse set of clinician-selected demonstrations for preference-aligned data candidate generation. Then, during the selection phase, we train a separate selection model, which explicitly distills clinician and policy-guided model preferences into a rating function to select high-quality data for medical instruction tuning. Results show that the model tuned with the instruction-following data from our method demonstrates a significant improvement in open visual chat (18.5% relatively) and medical VQA (win rate up to 81.73%). Our instruction-following data and models are available at https://BioMed-VITAL.github.io.

本刊更多论文

生物医学视觉教学调整与临床医师偏好对齐。

最近，多模态基础模型在理解和推理视觉和文本信息方面表现出了令人印象深刻的能力。将这些为一般用途而训练的基础模型应用于诸如生物医学之类的专业领域需要大规模的领域特定指令数据集。虽然现有的工作已经探索了自动管理这些数据集，但所得到的数据集并没有明确地与领域专业知识保持一致。在这项工作中，我们提出了一个以数据为中心的框架，生物医学视觉教学调整与临床医生偏好对齐（BioMed-VITAL），该框架将临床医生的偏好纳入到生成和选择用于调整生物医学多模态基础模型的教学数据的两个阶段。首先，在生成阶段，我们用一组不同的临床医生选择的演示来提示GPT-4V生成器，以生成符合偏好的候选数据。然后，在选择阶段，我们训练了一个单独的选择模型，该模型明确地将临床医生和政策导向模型的偏好提炼成一个评级函数，以选择高质量的数据进行医疗指导调整。结果表明，使用我们方法的指令跟随数据调整的模型在开放式视觉聊天（相对18.5%）和医疗VQA（胜率高达81.73%）方面有显着改善。我们的指导数据和模型可在https://BioMed-VITAL.github.io上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advances in neural information processing systems

自引率

0.00%

发文量