生物医学应用的大型语言模型的发展前景。

IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Annual Review of Biomedical Data Science Pub Date : 2025-04-01 DOI:10.1146/annurev-biodatasci-102224-074736

Zhiyuan Cao, Vipina K Keloth, Qianqian Xie, Lingfei Qian, Yuntian Liu, Yan Wang, Rui Shi, Weipeng Zhou, Gui Yang, Jeffrey Zhang, Xueqing Peng, Ethan Zhen, Ruey-Ling Weng, Qingyu Chen, Hua Xu

{"title":"生物医学应用的大型语言模型的发展前景。","authors":"Zhiyuan Cao, Vipina K Keloth, Qianqian Xie, Lingfei Qian, Yuntian Liu, Yan Wang, Rui Shi, Weipeng Zhou, Gui Yang, Jeffrey Zhang, Xueqing Peng, Ethan Zhen, Ruey-Ling Weng, Qingyu Chen, Hua Xu","doi":"10.1146/annurev-biodatasci-102224-074736","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) have become powerful tools for biomedical applications, offering potential to transform healthcare and medical research. Since the release of ChatGPT in 2022, there has been a surge in LLMs for diverse biomedical applications. This review examines the landscape of text-based biomedical LLM development, analyzing model characteristics (e.g., architecture), development processes (e.g., training strategy), and applications (e.g., chatbots). Following PRISMA guidelines, 82 articles were selected out of 5,512 articles since 2022 that met our rigorous criteria, including the requirement of using biomedical data when training LLMs. Findings highlight the predominant use of decoder-only architectures such as Llama 7B, prevalence of task-specific fine-tuning, and reliance on biomedical literature for training. Challenges persist in balancing data openness with privacy concerns and detailing model development, including computational resources used. Future efforts would benefit from multimodal integration, LLMs for specialized medical applications, and improved data sharing and model accessibility.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Development Landscape of Large Language Models for Biomedical Applications.\",\"authors\":\"Zhiyuan Cao, Vipina K Keloth, Qianqian Xie, Lingfei Qian, Yuntian Liu, Yan Wang, Rui Shi, Weipeng Zhou, Gui Yang, Jeffrey Zhang, Xueqing Peng, Ethan Zhen, Ruey-Ling Weng, Qingyu Chen, Hua Xu\",\"doi\":\"10.1146/annurev-biodatasci-102224-074736\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs) have become powerful tools for biomedical applications, offering potential to transform healthcare and medical research. Since the release of ChatGPT in 2022, there has been a surge in LLMs for diverse biomedical applications. This review examines the landscape of text-based biomedical LLM development, analyzing model characteristics (e.g., architecture), development processes (e.g., training strategy), and applications (e.g., chatbots). Following PRISMA guidelines, 82 articles were selected out of 5,512 articles since 2022 that met our rigorous criteria, including the requirement of using biomedical data when training LLMs. Findings highlight the predominant use of decoder-only architectures such as Llama 7B, prevalence of task-specific fine-tuning, and reliance on biomedical literature for training. Challenges persist in balancing data openness with privacy concerns and detailing model development, including computational resources used. Future efforts would benefit from multimodal integration, LLMs for specialized medical applications, and improved data sharing and model accessibility.\",\"PeriodicalId\":29775,\"journal\":{\"name\":\"Annual Review of Biomedical Data Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Review of Biomedical Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1146/annurev-biodatasci-102224-074736\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Review of Biomedical Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1146/annurev-biodatasci-102224-074736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）已经成为生物医学应用的强大工具，提供了改变医疗保健和医学研究的潜力。自2022年ChatGPT发布以来，各种生物医学应用的法学硕士数量激增。这篇综述考察了基于文本的生物医学法学硕士发展的前景，分析了模型特征（例如，架构）、开发过程（例如，培训策略）和应用（例如，聊天机器人）。遵循PRISMA指南，从2022年以来的5512篇文章中选择了82篇符合我们严格标准的文章，包括在培训法学硕士时使用生物医学数据的要求。研究结果强调了仅解码器架构（如Llama 7B）的主要使用，特定任务微调的流行，以及对生物医学文献的依赖。在平衡数据开放与隐私问题以及详细描述模型开发（包括使用的计算资源）之间的关系方面，挑战依然存在。未来的努力将受益于多模式集成、专门医疗应用的法学硕士以及改进的数据共享和模型可访问性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Development Landscape of Large Language Models for Biomedical Applications.

Large language models (LLMs) have become powerful tools for biomedical applications, offering potential to transform healthcare and medical research. Since the release of ChatGPT in 2022, there has been a surge in LLMs for diverse biomedical applications. This review examines the landscape of text-based biomedical LLM development, analyzing model characteristics (e.g., architecture), development processes (e.g., training strategy), and applications (e.g., chatbots). Following PRISMA guidelines, 82 articles were selected out of 5,512 articles since 2022 that met our rigorous criteria, including the requirement of using biomedical data when training LLMs. Findings highlight the predominant use of decoder-only architectures such as Llama 7B, prevalence of task-specific fine-tuning, and reliance on biomedical literature for training. Challenges persist in balancing data openness with privacy concerns and detailing model development, including computational resources used. Future efforts would benefit from multimodal integration, LLMs for specialized medical applications, and improved data sharing and model accessibility.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annual Review of Biomedical Data Science

CiteScore

11.10

自引率

1.70%

发文量

期刊介绍： The Annual Review of Biomedical Data Science provides comprehensive expert reviews in biomedical data science, focusing on advanced methods to store, retrieve, analyze, and organize biomedical data and knowledge. The scope of the journal encompasses informatics, computational, artificial intelligence (AI), and statistical approaches to biomedical data, including the sub-fields of bioinformatics, computational biology, biomedical informatics, clinical and clinical research informatics, biostatistics, and imaging informatics. The mission of the journal is to identify both emerging and established areas of biomedical data science, and the leaders in these fields.