电子健康记录基础模型预训练策略的基准测试。

IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES

JAMIA Open Pub Date : 2025-08-13 eCollection Date: 2025-08-01 DOI:10.1093/jamiaopen/ooaf090

Samson Mataraso, Shreya D'Souza, David Seong, Eloïse Berson, Camilo Espinosa, Nima Aghaeepour

{"title":"电子健康记录基础模型预训练策略的基准测试。","authors":"Samson Mataraso, Shreya D'Souza, David Seong, Eloïse Berson, Camilo Espinosa, Nima Aghaeepour","doi":"10.1093/jamiaopen/ooaf090","DOIUrl":null,"url":null,"abstract":"Objective: Our objective is to compare different pre-training strategies for electronic health record (EHR) foundation models.Materials and methods: We evaluated three approaches using a transformer-based architecture: baseline (no pre-training), self-supervised pre-training with masked language modeling, and supervised pre-training. The models were assessed on their ability to predict both major adverse cardiac events and mortality occurring within 12 months. The pre-training cohort was 405 679 patients prescribed antihypertensives and the fine tuning cohort was 5525 patients who received doxorubicin.Results: Task-specific supervised pre-training achieved superior performance (AUROC 0.70, AUPRC 0.23), outperforming both self-supervised pre-training and the baseline. However, when the model was evaluated on the task of 12-month mortality prediction, the self-supervised model performed best.Discussion: While supervised pre-training excels when aligned with downstream tasks, self-supervised approaches offer more generalized utility.Conclusion: Pre-training strategy selection should consider intended applications, data availability, and transferability requirements.","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf090"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349770/pdf/","citationCount":"0","resultStr":"{\"title\":\"Benchmarking of pre-training strategies for electronic health record foundation models.\",\"authors\":\"Samson Mataraso, Shreya D'Souza, David Seong, Eloïse Berson, Camilo Espinosa, Nima Aghaeepour\",\"doi\":\"10.1093/jamiaopen/ooaf090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: Our objective is to compare different pre-training strategies for electronic health record (EHR) foundation models.Materials and methods: We evaluated three approaches using a transformer-based architecture: baseline (no pre-training), self-supervised pre-training with masked language modeling, and supervised pre-training. The models were assessed on their ability to predict both major adverse cardiac events and mortality occurring within 12 months. The pre-training cohort was 405 679 patients prescribed antihypertensives and the fine tuning cohort was 5525 patients who received doxorubicin.Results: Task-specific supervised pre-training achieved superior performance (AUROC 0.70, AUPRC 0.23), outperforming both self-supervised pre-training and the baseline. However, when the model was evaluated on the task of 12-month mortality prediction, the self-supervised model performed best.Discussion: While supervised pre-training excels when aligned with downstream tasks, self-supervised approaches offer more generalized utility.Conclusion: Pre-training strategy selection should consider intended applications, data availability, and transferability requirements.\",\"PeriodicalId\":36278,\"journal\":{\"name\":\"JAMIA Open\",\"volume\":\"8 4\",\"pages\":\"ooaf090\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349770/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JAMIA Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamiaopen/ooaf090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

目的：我们的目的是比较不同的预训练策略的电子健康记录（EHR）基础模型。材料和方法：我们使用基于变压器的体系结构评估了三种方法：基线（无预训练）、使用屏蔽语言建模的自监督预训练和监督预训练。评估模型预测12个月内发生的主要不良心脏事件和死亡率的能力。训练前队列为405679例服用降压药的患者，微调队列为5525例服用阿霉素的患者。结果：特定任务监督预训练取得了优异的成绩（AUROC为0.70，AUPRC为0.23），优于自我监督预训练和基线。然而，当模型在12个月死亡率预测任务上进行评估时，自监督模型表现最好。讨论：虽然监督预训练在与下游任务相结合时表现出色，但自监督方法提供了更广泛的效用。结论：训练前策略选择应考虑预期应用、数据可用性和可移植性要求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Benchmarking of pre-training strategies for electronic health record foundation models.

查看原文本刊更多论文

Benchmarking of pre-training strategies for electronic health record foundation models.

Objective: Our objective is to compare different pre-training strategies for electronic health record (EHR) foundation models.

Materials and methods: We evaluated three approaches using a transformer-based architecture: baseline (no pre-training), self-supervised pre-training with masked language modeling, and supervised pre-training. The models were assessed on their ability to predict both major adverse cardiac events and mortality occurring within 12 months. The pre-training cohort was 405 679 patients prescribed antihypertensives and the fine tuning cohort was 5525 patients who received doxorubicin.

Results: Task-specific supervised pre-training achieved superior performance (AUROC 0.70, AUPRC 0.23), outperforming both self-supervised pre-training and the baseline. However, when the model was evaluated on the task of 12-month mortality prediction, the self-supervised model performed best.

Discussion: While supervised pre-training excels when aligned with downstream tasks, self-supervised approaches offer more generalized utility.

Conclusion: Pre-training strategy selection should consider intended applications, data availability, and transferability requirements.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊