语境学习与大语言模型：一个简单而有效的方法来提高放射学报告标签。

IF 2.1 Q3 MEDICAL INFORMATICS

Healthcare Informatics Research Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI:10.4258/hir.2025.31.3.295

Songsoo Kim, Donghyun Kim, Jaewoong Kim, Jalim Koo, Jinsik Yoon, Dukyong Yoon

{"title":"语境学习与大语言模型：一个简单而有效的方法来提高放射学报告标签。","authors":"Songsoo Kim, Donghyun Kim, Jaewoong Kim, Jalim Koo, Jinsik Yoon, Dukyong Yoon","doi":"10.4258/hir.2025.31.3.295","DOIUrl":null,"url":null,"abstract":"Objectives: This study assessed the effectiveness of in-context learning using Generative Pre-trained Transformer-4 (GPT-4) for labeling radiology reports.Methods: In this retrospective study, radiology reports were obtained from the Medical Information Mart for Intensive Care III database. Two structured prompts-the \"basic prompt\" and the \"in-context prompt\"- were compared. An optimization experiment was conducted to assess consistency and the occurrence of output format errors. The primary labeling experiments were performed on 200 unseen head computed tomography (CT) reports for multilabel classification of predefined labels (Experiment 1) and on 400 unseen abdominal CT reports for multi-label classification of actionable findings (Experiment 2).Results: The inter-reader accuracies in Experiments 1 and 2 were 0.93 and 0.84, respectively. For multi-label classification of head CT reports (Experiment 1), the in-context prompt led to notable increases in F1-scores for the \"foreign body\" and \"mass\" labels (gains of 0.66 and 0.22, respectively). However, improvements for other labels were modest. In multi-label classification of abdominal CT reports (Experiment 2), in-context prompts produced substantial improvements in F1-scores across all labels compared to basic prompts. Providing context equipped the model with domain-specific knowledge and helped align its existing knowledge, thereby improving performance.Conclusions: Incontext learning with GPT-4 consistently improved performance in labeling radiology reports. This approach is particularly effective for subjective labeling tasks and allows the model to align its criteria with those of human annotators for objective labeling. This practical strategy offers a simple, adaptable, and researcher-oriented method that can be applied to diverse labeling tasks.","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"295-309"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370419/pdf/","citationCount":"0","resultStr":"{\"title\":\"In-Context Learning with Large Language Models: A Simple and Effective Approach to Improve Radiology Report Labeling.\",\"authors\":\"Songsoo Kim, Donghyun Kim, Jaewoong Kim, Jalim Koo, Jinsik Yoon, Dukyong Yoon\",\"doi\":\"10.4258/hir.2025.31.3.295\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: This study assessed the effectiveness of in-context learning using Generative Pre-trained Transformer-4 (GPT-4) for labeling radiology reports.Methods: In this retrospective study, radiology reports were obtained from the Medical Information Mart for Intensive Care III database. Two structured prompts-the \\\"basic prompt\\\" and the \\\"in-context prompt\\\"- were compared. An optimization experiment was conducted to assess consistency and the occurrence of output format errors. The primary labeling experiments were performed on 200 unseen head computed tomography (CT) reports for multilabel classification of predefined labels (Experiment 1) and on 400 unseen abdominal CT reports for multi-label classification of actionable findings (Experiment 2).Results: The inter-reader accuracies in Experiments 1 and 2 were 0.93 and 0.84, respectively. For multi-label classification of head CT reports (Experiment 1), the in-context prompt led to notable increases in F1-scores for the \\\"foreign body\\\" and \\\"mass\\\" labels (gains of 0.66 and 0.22, respectively). However, improvements for other labels were modest. In multi-label classification of abdominal CT reports (Experiment 2), in-context prompts produced substantial improvements in F1-scores across all labels compared to basic prompts. Providing context equipped the model with domain-specific knowledge and helped align its existing knowledge, thereby improving performance.Conclusions: Incontext learning with GPT-4 consistently improved performance in labeling radiology reports. This approach is particularly effective for subjective labeling tasks and allows the model to align its criteria with those of human annotators for objective labeling. This practical strategy offers a simple, adaptable, and researcher-oriented method that can be applied to diverse labeling tasks.\",\"PeriodicalId\":12947,\"journal\":{\"name\":\"Healthcare Informatics Research\",\"volume\":\"31 3\",\"pages\":\"295-309\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370419/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Healthcare Informatics Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4258/hir.2025.31.3.295\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/31 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/hir.2025.31.3.295","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/31 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

目的：本研究评估了使用生成式预训练转换器-4 （GPT-4）标记放射学报告的上下文学习的有效性。方法：在这项回顾性研究中，从重症监护医学信息市场III数据库中获得放射学报告。比较了两种结构化提示——“基本提示”和“上下文提示”。进行了优化实验，以评估一致性和输出格式错误的发生情况。对200份未见的头部计算机断层扫描（CT）报告进行初步标记实验，对预定义标签进行多标签分类（实验1），对400份未见的腹部CT报告进行多标签分类，对可操作的发现进行多标签分类（实验2）。结果：实验1和实验2的读间准确度分别为0.93和0.84。对于头部CT报告的多标签分类（实验1），上下文提示导致“异物”和“质量”标签的f1分数显著增加（分别增加0.66和0.22）。然而，其他品牌的改善幅度不大。在腹部CT报告的多标签分类中（实验2），与基本提示相比，上下文提示在所有标签上都产生了显著的f1分数提高。提供上下文为模型配备了特定于领域的知识，并帮助对齐其现有的知识，从而提高了性能。结论：使用GPT-4的情境学习持续提高了标记放射学报告的表现。这种方法对于主观标注任务特别有效，并且允许模型将其标准与用于客观标注的人类注释者的标准保持一致。这个实用的策略提供了一个简单的，适应性强的，面向研究人员的方法，可以应用于不同的标签任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

In-Context Learning with Large Language Models: A Simple and Effective Approach to Improve Radiology Report Labeling.

查看原文本刊更多论文

In-Context Learning with Large Language Models: A Simple and Effective Approach to Improve Radiology Report Labeling.

Objectives: This study assessed the effectiveness of in-context learning using Generative Pre-trained Transformer-4 (GPT-4) for labeling radiology reports.

Methods: In this retrospective study, radiology reports were obtained from the Medical Information Mart for Intensive Care III database. Two structured prompts-the "basic prompt" and the "in-context prompt"- were compared. An optimization experiment was conducted to assess consistency and the occurrence of output format errors. The primary labeling experiments were performed on 200 unseen head computed tomography (CT) reports for multilabel classification of predefined labels (Experiment 1) and on 400 unseen abdominal CT reports for multi-label classification of actionable findings (Experiment 2).

Results: The inter-reader accuracies in Experiments 1 and 2 were 0.93 and 0.84, respectively. For multi-label classification of head CT reports (Experiment 1), the in-context prompt led to notable increases in F1-scores for the "foreign body" and "mass" labels (gains of 0.66 and 0.22, respectively). However, improvements for other labels were modest. In multi-label classification of abdominal CT reports (Experiment 2), in-context prompts produced substantial improvements in F1-scores across all labels compared to basic prompts. Providing context equipped the model with domain-specific knowledge and helped align its existing knowledge, thereby improving performance.

Conclusions: Incontext learning with GPT-4 consistently improved performance in labeling radiology reports. This approach is particularly effective for subjective labeling tasks and allows the model to align its criteria with those of human annotators for objective labeling. This practical strategy offers a simple, adaptable, and researcher-oriented method that can be applied to diverse labeling tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Healthcare Informatics Research MEDICAL INFORMATICS-

CiteScore

4.90

自引率

6.90%

发文量