Wanxin Li, Saad Ahmed, Yongjin P Park, Khanh Dao Duc
{"title":"基于传输的电子健康记录迁移学习:应用于治疗差异的检测。","authors":"Wanxin Li, Saad Ahmed, Yongjin P Park, Khanh Dao Duc","doi":"10.1093/jamia/ocaf134","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Electronic Health Records (EHRs) sampled from different populations can introduce unwanted biases, limit individual-level data sharing, and make the data and fitted model hardly transferable across different population groups. In this context, our main goal is to design an effective method to transfer knowledge between population groups, with computable guarantees for suitability, and that can be applied to quantify treatment disparities.</p><p><strong>Materials and methods: </strong>For a model trained in an embedded feature space of one subgroup, our proposed framework, Optimal Transport-based Transfer Learning for EHRs (OTTEHR), combines feature embedding of the data and unbalanced optimal transport (OT) for domain adaptation to another population group. To test our method, we processed and divided the MIMIC-III and MIMIC-IV databases into multiple population groups using ICD codes and multiple labels.</p><p><strong>Results: </strong>We derive a theoretical bound for the generalization error of our method, and interpret it in terms of the Wasserstein distance, unbalancedness between the source and target domains, and labeling divergence, which can be used as a guide for assessing the suitability of binary classification and regression tasks. In general, our method achieves better accuracy and computational efficiency compared with standard and machine learning transfer learning methods on various tasks. Upon testing our method for populations with different insurance plans, we detect various levels of disparities in hospital duration stay between groups.</p><p><strong>Discussion and conclusion: </strong>By leveraging tools from OT theory, our proposed framework allows to compare statistical models on EHR data between different population groups. As a potential application for clinical decision making, we quantify treatment disparities between different population groups. Future directions include applying OTTEHR to broader regression and classification tasks and extending the method to semi-supervised learning.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transport-based transfer learning on Electronic Health Records: application to detection of treatment disparities.\",\"authors\":\"Wanxin Li, Saad Ahmed, Yongjin P Park, Khanh Dao Duc\",\"doi\":\"10.1093/jamia/ocaf134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>Electronic Health Records (EHRs) sampled from different populations can introduce unwanted biases, limit individual-level data sharing, and make the data and fitted model hardly transferable across different population groups. In this context, our main goal is to design an effective method to transfer knowledge between population groups, with computable guarantees for suitability, and that can be applied to quantify treatment disparities.</p><p><strong>Materials and methods: </strong>For a model trained in an embedded feature space of one subgroup, our proposed framework, Optimal Transport-based Transfer Learning for EHRs (OTTEHR), combines feature embedding of the data and unbalanced optimal transport (OT) for domain adaptation to another population group. To test our method, we processed and divided the MIMIC-III and MIMIC-IV databases into multiple population groups using ICD codes and multiple labels.</p><p><strong>Results: </strong>We derive a theoretical bound for the generalization error of our method, and interpret it in terms of the Wasserstein distance, unbalancedness between the source and target domains, and labeling divergence, which can be used as a guide for assessing the suitability of binary classification and regression tasks. In general, our method achieves better accuracy and computational efficiency compared with standard and machine learning transfer learning methods on various tasks. Upon testing our method for populations with different insurance plans, we detect various levels of disparities in hospital duration stay between groups.</p><p><strong>Discussion and conclusion: </strong>By leveraging tools from OT theory, our proposed framework allows to compare statistical models on EHR data between different population groups. As a potential application for clinical decision making, we quantify treatment disparities between different population groups. Future directions include applying OTTEHR to broader regression and classification tasks and extending the method to semi-supervised learning.</p>\",\"PeriodicalId\":50016,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1093/jamia/ocaf134\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf134","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Transport-based transfer learning on Electronic Health Records: application to detection of treatment disparities.
Objectives: Electronic Health Records (EHRs) sampled from different populations can introduce unwanted biases, limit individual-level data sharing, and make the data and fitted model hardly transferable across different population groups. In this context, our main goal is to design an effective method to transfer knowledge between population groups, with computable guarantees for suitability, and that can be applied to quantify treatment disparities.
Materials and methods: For a model trained in an embedded feature space of one subgroup, our proposed framework, Optimal Transport-based Transfer Learning for EHRs (OTTEHR), combines feature embedding of the data and unbalanced optimal transport (OT) for domain adaptation to another population group. To test our method, we processed and divided the MIMIC-III and MIMIC-IV databases into multiple population groups using ICD codes and multiple labels.
Results: We derive a theoretical bound for the generalization error of our method, and interpret it in terms of the Wasserstein distance, unbalancedness between the source and target domains, and labeling divergence, which can be used as a guide for assessing the suitability of binary classification and regression tasks. In general, our method achieves better accuracy and computational efficiency compared with standard and machine learning transfer learning methods on various tasks. Upon testing our method for populations with different insurance plans, we detect various levels of disparities in hospital duration stay between groups.
Discussion and conclusion: By leveraging tools from OT theory, our proposed framework allows to compare statistical models on EHR data between different population groups. As a potential application for clinical decision making, we quantify treatment disparities between different population groups. Future directions include applying OTTEHR to broader regression and classification tasks and extending the method to semi-supervised learning.
期刊介绍:
JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.