面向大型语言模型医学文本生成的综合评价：程序化度量、人的评估和大型语言模型判断

Medicine Advances Pub Date : 2025-02-28 DOI:10.1002/med4.70002

Han Yuan

{"title":"面向大型语言模型医学文本生成的综合评价：程序化度量、人的评估和大型语言模型判断","authors":"Han Yuan","doi":"10.1002/med4.70002","DOIUrl":null,"url":null,"abstract":"<p>This commentary discusses three evaluation approaches for assessing large language models' generation in healthcare: programmatic metrics, human assessment, and large language models judgment. No single approach can address all challenges; however, the combination of these three methods provides a pipeline toward the comprehensive evaluation of medical text generation.\n <figure>\n <div><picture>\n <source></source></picture><p></p>\n </div>\n </figure></p>","PeriodicalId":100913,"journal":{"name":"Medicine Advances","volume":"3 1","pages":"46-49"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/med4.70002","citationCount":"0","resultStr":"{\"title\":\"Toward the Comprehensive Evaluation of Medical Text Generation by Large Language Models: Programmatic Metrics, Human Assessment, and Large Language Models Judgment\",\"authors\":\"Han Yuan\",\"doi\":\"10.1002/med4.70002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This commentary discusses three evaluation approaches for assessing large language models' generation in healthcare: programmatic metrics, human assessment, and large language models judgment. No single approach can address all challenges; however, the combination of these three methods provides a pipeline toward the comprehensive evaluation of medical text generation.\\n <figure>\\n <div><picture>\\n <source></source></picture><p></p>\\n </div>\\n </figure></p>\",\"PeriodicalId\":100913,\"journal\":{\"name\":\"Medicine Advances\",\"volume\":\"3 1\",\"pages\":\"46-49\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/med4.70002\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medicine Advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/med4.70002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine Advances","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/med4.70002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

这篇评论讨论了评估医疗保健中大型语言模型生成的三种评估方法：程序化度量、人工评估和大型语言模型判断。没有一种方法可以解决所有挑战；然而，这三种方法的结合为医学文本生成的综合评价提供了一条管道。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Toward the Comprehensive Evaluation of Medical Text Generation by Large Language Models: Programmatic Metrics, Human Assessment, and Large Language Models Judgment

查看原文本刊更多论文

Toward the Comprehensive Evaluation of Medical Text Generation by Large Language Models: Programmatic Metrics, Human Assessment, and Large Language Models Judgment

This commentary discusses three evaluation approaches for assessing large language models' generation in healthcare: programmatic metrics, human assessment, and large language models judgment. No single approach can address all challenges; however, the combination of these three methods provides a pipeline toward the comprehensive evaluation of medical text generation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medicine Advances

自引率

0.00%

发文量