SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models.

ArXiv Pub Date : 2025-07-17

{"title":"SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models.","authors":"","doi":"","DOIUrl":null,"url":null,"abstract":"Objective: Computational phenotyping is a central informatics activity with resulting cohorts supporting a wide variety of applications. However, it is time-intensive because of manual data review, limited automation, and difficulties in adapting algorithms across sources. Since LLMs have demonstrated promising capabilities for text classification, comprehension, and generation, we posit they will perform well at repetitive manual review tasks traditionally performed by human experts. To support next-generation computational phenotyping methods, we developed SHREC, a framework for comprehensive integration of LLMs into end-to-end phenotyping pipelines.Methods: We applied and tested the ability of three lightweight LLMs (Gemma2 27 billion, Mistral Small 24 billion, and Phi-4 14 billion) to classify concepts and phenotype patients using previously developed phenotypes for ARF respiratory support therapies.Results: All models performed well on concept classification, with the best model (Mistral) achieving an AUROC of 0.896 across all relevant concepts. For phenotyping, models demonstrated near-perfect specificity for all phenotypes, and the top-performing model (Mistral) reached an average AUROC of 0.853 for single-therapy phenotypes, despite lower performance on multi-therapy phenotypes.Conclusion: Current lightweight LLMs can feasibly assist researchers with resource-intensive phenotyping tasks such as manual data review. There are several advantages of LLMs that support their application to computational phenotyping, such as their ability to adapt to new tasks with prompt engineering alone and their ability to incorporate raw EHR data. Future steps to advance next-generation phenotyping methods include determining optimal strategies for integrating biomedical data, exploring how LLMs reason, and advancing generative model methods.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288648/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Computational phenotyping is a central informatics activity with resulting cohorts supporting a wide variety of applications. However, it is time-intensive because of manual data review, limited automation, and difficulties in adapting algorithms across sources. Since LLMs have demonstrated promising capabilities for text classification, comprehension, and generation, we posit they will perform well at repetitive manual review tasks traditionally performed by human experts. To support next-generation computational phenotyping methods, we developed SHREC, a framework for comprehensive integration of LLMs into end-to-end phenotyping pipelines.

Methods: We applied and tested the ability of three lightweight LLMs (Gemma2 27 billion, Mistral Small 24 billion, and Phi-4 14 billion) to classify concepts and phenotype patients using previously developed phenotypes for ARF respiratory support therapies.

Results: All models performed well on concept classification, with the best model (Mistral) achieving an AUROC of 0.896 across all relevant concepts. For phenotyping, models demonstrated near-perfect specificity for all phenotypes, and the top-performing model (Mistral) reached an average AUROC of 0.853 for single-therapy phenotypes, despite lower performance on multi-therapy phenotypes.

Conclusion: Current lightweight LLMs can feasibly assist researchers with resource-intensive phenotyping tasks such as manual data review. There are several advantages of LLMs that support their application to computational phenotyping, such as their ability to adapt to new tasks with prompt engineering alone and their ability to incorporate raw EHR data. Future steps to advance next-generation phenotyping methods include determining optimal strategies for integrating biomedical data, exploring how LLMs reason, and advancing generative model methods.

本刊更多论文

SHREC：一个使用大型语言模型推进下一代计算表型的框架。

目的：计算表型是一项中心信息学活动，其结果支持各种各样的应用。然而，由于人工数据审查、有限的自动化以及跨数据源适应算法的困难，它是时间密集型的。由于法学硕士已经展示了文本分类、理解和生成的良好能力，我们假设它们将在传统上由人类专家执行的重复性手动审查任务中表现良好。为了支持下一代计算表型方法，我们开发了SHREC，这是一个将llm全面集成到端到端表型管道中的框架。方法：我们应用并测试了三种轻量级llm （gemma27亿，Mistral 240亿和Phi-4 140亿）对概念和使用先前开发的表型进行ARF呼吸支持治疗的患者进行分类和表型的能力。结果：所有模型在概念分类上均表现良好，其中最佳模型（Mistral）在所有相关概念上的AUROC为0.896。对于表型，模型对所有表型都表现出近乎完美的特异性，表现最好的模型（Mistral）在单治疗表型上的平均AUROC达到0.853，尽管在多治疗表型上表现较差。结论：目前的轻量级llm可以帮助研究人员完成资源密集型的表型任务，如手动数据审查。llm有几个优势，支持其应用于计算表型，例如它们能够适应新的任务，仅通过快速工程和合并原始EHR数据的能力。推进下一代表型方法的未来步骤包括确定整合生物医学数据的最佳策略，探索llm如何推理，以及推进生成模型方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量