GPT-4o and the quest for machine learning interpretability in ICU risk of death prediction.

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-10-13 DOI:10.1186/s12911-025-03224-z

Moein E Samadi, Kateryna Nikulina, Sebastian Johannes Fritsch, Andreas Schuppert

{"title":"GPT-4o and the quest for machine learning interpretability in ICU risk of death prediction.","authors":"Moein E Samadi, Kateryna Nikulina, Sebastian Johannes Fritsch, Andreas Schuppert","doi":"10.1186/s12911-025-03224-z","DOIUrl":null,"url":null,"abstract":"Background: Clinical utilization of machine learning is hampered by the lack of interpretability inherent in most non-linear black box modeling approaches, reducing trust among clinicians and regulators. Advanced large language models offer a potential framework for integrating medical knowledge into these models, potentially enhancing their interpretability.Methods: A hybrid mechanistic/data-driven modeling framework is presented for developing an ICU risk of death prediction model for mechanically ventilated patients. In the mechanistic modeling part, GPT-4o is used to generate detailed medical feature descriptions, which are then aggregated into a comprehensive corpus and processed with TF-I DF vectorization. Fuzzy C-means clustering is subsequently applied to these vectorized features to identify significant mortality cause-specific feature clusters, and a physician reviewed the resulting clusters to validate their relevance to actionable insights for clinical decision support. In the data-driven part, the identified clusters inform the creation of XGBoost-based weak classifiers, whose outcomes are combined into a single XGBoost-based strong classifier through a hierarchically structured feed-forward network. This process results in a novel GPT hybrid model for ICU risk of death prediction.Results: This study enrolled 16,018 mechanically ventilated ICU patients, divided into derivation (12,758) and validation (3,260) cohorts, to develop and evaluate a GPT hybrid model for predicting in-ICU death. Leveraging GPT-4o, we implemented an automated process for clustering mortality cause-specific features, resulting in six feature clusters: Liver Failure, Infection, Renal Failure, Hypoxia, Cardiac Failure, and Mechanical Ventilation. This approach significantly improved upon previous manual methods, automating the reconstruction of structured hybrid models. While the GPT hybrid model showed similar predictive accuracy to a Global XGBoost model, it demonstrated superior interpretability and clinical relevance by incorporating a wider array of features and providing a hierarchical structure of feature importance aligned with medical knowledge.Conclusion: We introduce a novel approach to predicting in-ICU risk of death for mechanically ventilated patients using a GPT hybrid model. Our methodology demonstrates the potential of integrating large language models with traditional machine learning techniques to create interpretable and clinically relevant predictive models.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"373"},"PeriodicalIF":3.8000,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03224-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Clinical utilization of machine learning is hampered by the lack of interpretability inherent in most non-linear black box modeling approaches, reducing trust among clinicians and regulators. Advanced large language models offer a potential framework for integrating medical knowledge into these models, potentially enhancing their interpretability.

Methods: A hybrid mechanistic/data-driven modeling framework is presented for developing an ICU risk of death prediction model for mechanically ventilated patients. In the mechanistic modeling part, GPT-4o is used to generate detailed medical feature descriptions, which are then aggregated into a comprehensive corpus and processed with TF-I DF vectorization. Fuzzy C-means clustering is subsequently applied to these vectorized features to identify significant mortality cause-specific feature clusters, and a physician reviewed the resulting clusters to validate their relevance to actionable insights for clinical decision support. In the data-driven part, the identified clusters inform the creation of XGBoost-based weak classifiers, whose outcomes are combined into a single XGBoost-based strong classifier through a hierarchically structured feed-forward network. This process results in a novel GPT hybrid model for ICU risk of death prediction.

Results: This study enrolled 16,018 mechanically ventilated ICU patients, divided into derivation (12,758) and validation (3,260) cohorts, to develop and evaluate a GPT hybrid model for predicting in-ICU death. Leveraging GPT-4o, we implemented an automated process for clustering mortality cause-specific features, resulting in six feature clusters: Liver Failure, Infection, Renal Failure, Hypoxia, Cardiac Failure, and Mechanical Ventilation. This approach significantly improved upon previous manual methods, automating the reconstruction of structured hybrid models. While the GPT hybrid model showed similar predictive accuracy to a Global XGBoost model, it demonstrated superior interpretability and clinical relevance by incorporating a wider array of features and providing a hierarchical structure of feature importance aligned with medical knowledge.

Conclusion: We introduce a novel approach to predicting in-ICU risk of death for mechanically ventilated patients using a GPT hybrid model. Our methodology demonstrates the potential of integrating large language models with traditional machine learning techniques to create interpretable and clinically relevant predictive models.

查看原文本刊更多论文

gpt - 40和ICU死亡风险预测中机器学习可解释性的探索。

背景：大多数非线性黑箱建模方法缺乏固有的可解释性，阻碍了机器学习的临床应用，减少了临床医生和监管机构之间的信任。先进的大型语言模型为将医学知识集成到这些模型中提供了一个潜在的框架，潜在地提高了它们的可解释性。方法：采用机制/数据驱动的混合建模框架，建立ICU机械通气患者死亡风险预测模型。在机理建模部分，使用gpt - 40生成详细的医学特征描述，然后将其聚合成一个综合的语料库，并进行TF-I DF矢量化处理。随后，将模糊c均值聚类应用于这些矢量化特征，以确定重要的死亡原因特异性特征聚类，并由医生审查结果聚类，以验证其与临床决策支持的可操作见解的相关性。在数据驱动部分，已识别的集群通知创建基于xgboost的弱分类器，其结果通过分层结构的前馈网络组合成单个基于xgboost的强分类器。这一过程产生了一种用于ICU死亡风险预测的新型GPT混合模型。结果：本研究纳入16018例机械通气ICU患者，分为衍生队列（12758例）和验证队列（3260例），以建立和评估预测ICU内死亡的GPT混合模型。利用gpt - 40，我们实现了一个自动化的过程，对死亡率的特定原因特征进行聚类，得出六个特征聚类：肝功能衰竭、感染、肾功能衰竭、缺氧、心力衰竭和机械通气。该方法大大改进了以往的手工方法，实现了结构化混合模型的自动化重建。虽然GPT混合模型显示出与Global XGBoost模型相似的预测准确性，但它通过纳入更广泛的特征并提供与医学知识相一致的特征重要性的层次结构，显示出更高的可解释性和临床相关性。结论：我们介绍了一种使用GPT混合模型预测icu内机械通气患者死亡风险的新方法。我们的方法展示了将大型语言模型与传统机器学习技术相结合的潜力，以创建可解释的和临床相关的预测模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.