Cost-effective instruction learning for pathology vision and language analysis

IF 18.3 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science Pub Date : 2025-06-19 DOI:10.1038/s43588-025-00818-5

Kaitao Chen, Mianxin Liu, Fang Yan, Lei Ma, Xiaoming Shi, Lilong Wang, Xiaosong Wang, Lifeng Zhu, Zhe Wang, Mu Zhou, Shaoting Zhang

{"title":"Cost-effective instruction learning for pathology vision and language analysis","authors":"Kaitao Chen, Mianxin Liu, Fang Yan, Lei Ma, Xiaoming Shi, Lilong Wang, Xiaosong Wang, Lifeng Zhu, Zhe Wang, Mu Zhou, Shaoting Zhang","doi":"10.1038/s43588-025-00818-5","DOIUrl":null,"url":null,"abstract":"The advent of vision–language models fosters interactive conversations between artificial intelligence-enabled models and humans. However, applying these models in the clinic faces challenges related to large-scale training data as well as financial and computational resources. Here we propose CLOVER, a cost-effective instruction learning framework for conversational pathology. CLOVER trains a lightweight module and uses instruction tuning while freezing the parameters of the large language model. Instead of using costly GPT-4, we propose well-designed prompts on GPT-3.5 for building generation-based instructions, emphasizing the utility of pathological knowledge derived from the Internet source. We construct a high-quality set of template-based instructions in the context of digital pathology. Using two benchmark datasets, our findings reveal the strength of hybrid-form, pathological visual question–answer instructions. CLOVER outperforms baselines that possess 37 times more training parameters and exhibits few-shot capacity on an external clinical dataset. CLOVER could thus accelerate the adoption of rapid conversational applications in digital pathology. Training foundation models often requires a costly budget and excessive computational resources. In this study, a low-cost instruction learning framework is proposed that could enable the rapid adoption of visual-language pathology applications.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 7","pages":"524-533"},"PeriodicalIF":18.3000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43588-025-00818-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

The advent of vision–language models fosters interactive conversations between artificial intelligence-enabled models and humans. However, applying these models in the clinic faces challenges related to large-scale training data as well as financial and computational resources. Here we propose CLOVER, a cost-effective instruction learning framework for conversational pathology. CLOVER trains a lightweight module and uses instruction tuning while freezing the parameters of the large language model. Instead of using costly GPT-4, we propose well-designed prompts on GPT-3.5 for building generation-based instructions, emphasizing the utility of pathological knowledge derived from the Internet source. We construct a high-quality set of template-based instructions in the context of digital pathology. Using two benchmark datasets, our findings reveal the strength of hybrid-form, pathological visual question–answer instructions. CLOVER outperforms baselines that possess 37 times more training parameters and exhibits few-shot capacity on an external clinical dataset. CLOVER could thus accelerate the adoption of rapid conversational applications in digital pathology. Training foundation models often requires a costly budget and excessive computational resources. In this study, a low-cost instruction learning framework is proposed that could enable the rapid adoption of visual-language pathology applications.

Abstract Image

查看原文本刊更多论文

具有成本效益的病理视觉与语言分析教学。

视觉语言模型的出现促进了人工智能模型与人类之间的互动对话。然而，将这些模型应用于临床面临着与大规模训练数据以及财务和计算资源相关的挑战。在这里，我们提出CLOVER，一个具有成本效益的会话病理学教学框架。CLOVER训练一个轻量级模块，并在冻结大型语言模型参数的同时使用指令调优。我们不使用昂贵的GPT-4，而是在GPT-3.5上提出精心设计的提示，用于构建基于生成的指令，强调从互联网来源获得的病理知识的效用。我们在数字病理学的背景下构建了一套高质量的基于模板的指令。使用两个基准数据集，我们的发现揭示了混合形式的力量，病理视觉问答指令。CLOVER优于基线，拥有37倍以上的训练参数，并在外部临床数据集上显示少量的射击能力。因此，CLOVER可以加速数字病理学中快速会话应用的采用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature computational science

CiteScore

11.70

自引率

0.00%

发文量