电子健康记录基础模型预训练策略的基准测试。

IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES
JAMIA Open Pub Date : 2025-08-13 eCollection Date: 2025-08-01 DOI:10.1093/jamiaopen/ooaf090
Samson Mataraso, Shreya D'Souza, David Seong, Eloïse Berson, Camilo Espinosa, Nima Aghaeepour
{"title":"电子健康记录基础模型预训练策略的基准测试。","authors":"Samson Mataraso, Shreya D'Souza, David Seong, Eloïse Berson, Camilo Espinosa, Nima Aghaeepour","doi":"10.1093/jamiaopen/ooaf090","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Our objective is to compare different pre-training strategies for electronic health record (EHR) foundation models.</p><p><strong>Materials and methods: </strong>We evaluated three approaches using a transformer-based architecture: baseline (no pre-training), self-supervised pre-training with masked language modeling, and supervised pre-training. The models were assessed on their ability to predict both major adverse cardiac events and mortality occurring within 12 months. The pre-training cohort was 405 679 patients prescribed antihypertensives and the fine tuning cohort was 5525 patients who received doxorubicin.</p><p><strong>Results: </strong>Task-specific supervised pre-training achieved superior performance (AUROC 0.70, AUPRC 0.23), outperforming both self-supervised pre-training and the baseline. However, when the model was evaluated on the task of 12-month mortality prediction, the self-supervised model performed best.</p><p><strong>Discussion: </strong>While supervised pre-training excels when aligned with downstream tasks, self-supervised approaches offer more generalized utility.</p><p><strong>Conclusion: </strong>Pre-training strategy selection should consider intended applications, data availability, and transferability requirements.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf090"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349770/pdf/","citationCount":"0","resultStr":"{\"title\":\"Benchmarking of pre-training strategies for electronic health record foundation models.\",\"authors\":\"Samson Mataraso, Shreya D'Souza, David Seong, Eloïse Berson, Camilo Espinosa, Nima Aghaeepour\",\"doi\":\"10.1093/jamiaopen/ooaf090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>Our objective is to compare different pre-training strategies for electronic health record (EHR) foundation models.</p><p><strong>Materials and methods: </strong>We evaluated three approaches using a transformer-based architecture: baseline (no pre-training), self-supervised pre-training with masked language modeling, and supervised pre-training. The models were assessed on their ability to predict both major adverse cardiac events and mortality occurring within 12 months. The pre-training cohort was 405 679 patients prescribed antihypertensives and the fine tuning cohort was 5525 patients who received doxorubicin.</p><p><strong>Results: </strong>Task-specific supervised pre-training achieved superior performance (AUROC 0.70, AUPRC 0.23), outperforming both self-supervised pre-training and the baseline. However, when the model was evaluated on the task of 12-month mortality prediction, the self-supervised model performed best.</p><p><strong>Discussion: </strong>While supervised pre-training excels when aligned with downstream tasks, self-supervised approaches offer more generalized utility.</p><p><strong>Conclusion: </strong>Pre-training strategy selection should consider intended applications, data availability, and transferability requirements.</p>\",\"PeriodicalId\":36278,\"journal\":{\"name\":\"JAMIA Open\",\"volume\":\"8 4\",\"pages\":\"ooaf090\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12349770/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JAMIA Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamiaopen/ooaf090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

目的:我们的目的是比较不同的预训练策略的电子健康记录(EHR)基础模型。材料和方法:我们使用基于变压器的体系结构评估了三种方法:基线(无预训练)、使用屏蔽语言建模的自监督预训练和监督预训练。评估模型预测12个月内发生的主要不良心脏事件和死亡率的能力。训练前队列为405679例服用降压药的患者,微调队列为5525例服用阿霉素的患者。结果:特定任务监督预训练取得了优异的成绩(AUROC为0.70,AUPRC为0.23),优于自我监督预训练和基线。然而,当模型在12个月死亡率预测任务上进行评估时,自监督模型表现最好。讨论:虽然监督预训练在与下游任务相结合时表现出色,但自监督方法提供了更广泛的效用。结论:训练前策略选择应考虑预期应用、数据可用性和可移植性要求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Benchmarking of pre-training strategies for electronic health record foundation models.

Benchmarking of pre-training strategies for electronic health record foundation models.

Benchmarking of pre-training strategies for electronic health record foundation models.

Objective: Our objective is to compare different pre-training strategies for electronic health record (EHR) foundation models.

Materials and methods: We evaluated three approaches using a transformer-based architecture: baseline (no pre-training), self-supervised pre-training with masked language modeling, and supervised pre-training. The models were assessed on their ability to predict both major adverse cardiac events and mortality occurring within 12 months. The pre-training cohort was 405 679 patients prescribed antihypertensives and the fine tuning cohort was 5525 patients who received doxorubicin.

Results: Task-specific supervised pre-training achieved superior performance (AUROC 0.70, AUPRC 0.23), outperforming both self-supervised pre-training and the baseline. However, when the model was evaluated on the task of 12-month mortality prediction, the self-supervised model performed best.

Discussion: While supervised pre-training excels when aligned with downstream tasks, self-supervised approaches offer more generalized utility.

Conclusion: Pre-training strategy selection should consider intended applications, data availability, and transferability requirements.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信