Med-MGF: multi-level graph-based framework for handling medical data imbalance and representation.

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS
Tuong Minh Nguyen, Kim Leng Poh, Shu-Ling Chong, Jan Hau Lee
{"title":"Med-MGF: multi-level graph-based framework for handling medical data imbalance and representation.","authors":"Tuong Minh Nguyen, Kim Leng Poh, Shu-Ling Chong, Jan Hau Lee","doi":"10.1186/s12911-024-02649-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Modeling patient data, particularly electronic health records (EHR), is one of the major focuses of machine learning studies in healthcare, as these records provide clinicians with valuable information that can potentially assist them in disease diagnosis and decision-making.</p><p><strong>Methods: </strong>In this study, we present a multi-level graph-based framework called MedMGF, which models both patient medical profiles extracted from EHR data and their relationship network of health profiles in a single architecture. The medical profiles consist of several layers of data embedding derived from interval records obtained during hospitalization, and the patient-patient network is created by measuring the similarities between these profiles. We also propose a modification to the Focal Loss (FL) function to improve classification performance in imbalanced datasets without the need to imputate the data. MedMGF's performance was evaluated against several Graphical Convolutional Network (GCN) baseline models implemented with Binary Cross Entropy (BCE), FL, class balancing parameter <math><mi>α</mi></math> , and Synthetic Minority Oversampling Technique (SMOTE).</p><p><strong>Results: </strong>Our proposed framework achieved high classification performance (AUC: 0.8098, ACC: 0.7503, SEN: 0.8750, SPE: 0.7445, NPV: 0.9923, PPV: 0.1367) on an extreme imbalanced pediatric sepsis dataset (n=3,014, imbalance ratio of 0.047). It yielded a classification improvement of 3.81% for AUC, 15% for SEN compared to the baseline GCN+ <math><mi>α</mi></math> FL (AUC: 0.7717, ACC: 0.8144, SEN: 0.7250, SPE: 0.8185, PPV: 0.1559, NPV: 0.9847), and an improvement of 5.88% in AUC and 22.5% compared to GCN+FL+SMOTE (AUC: 0.7510, ACC: 0.8431, SEN: 0.6500, SPE: 0.8520, PPV: 0.1688, NPV: 0.9814). It also showed a classification improvement of 3.86% for AUC, 15% for SEN compared to the baseline GCN+ <math><mi>α</mi></math> BCE (AUC: 0.7712, ACC: 0.8133, SEN: 0.7250, SPE: 0.8173, PPV: 0.1551, NPV: 0.9847), and an improvement of 14.33% in AUC and 27.5% in comparison to GCN+BCE+SMOTE (AUC: 0.6665, ACC: 0.7271, SEN: 0.6000, SPE: 0.7329, PPV: 0.0941, NPV: 0.9754).</p><p><strong>Conclusion: </strong>When compared to all baseline models, MedMGF achieved the highest SEN and AUC results, demonstrating the potential for several healthcare applications.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11367907/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02649-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Modeling patient data, particularly electronic health records (EHR), is one of the major focuses of machine learning studies in healthcare, as these records provide clinicians with valuable information that can potentially assist them in disease diagnosis and decision-making.

Methods: In this study, we present a multi-level graph-based framework called MedMGF, which models both patient medical profiles extracted from EHR data and their relationship network of health profiles in a single architecture. The medical profiles consist of several layers of data embedding derived from interval records obtained during hospitalization, and the patient-patient network is created by measuring the similarities between these profiles. We also propose a modification to the Focal Loss (FL) function to improve classification performance in imbalanced datasets without the need to imputate the data. MedMGF's performance was evaluated against several Graphical Convolutional Network (GCN) baseline models implemented with Binary Cross Entropy (BCE), FL, class balancing parameter α , and Synthetic Minority Oversampling Technique (SMOTE).

Results: Our proposed framework achieved high classification performance (AUC: 0.8098, ACC: 0.7503, SEN: 0.8750, SPE: 0.7445, NPV: 0.9923, PPV: 0.1367) on an extreme imbalanced pediatric sepsis dataset (n=3,014, imbalance ratio of 0.047). It yielded a classification improvement of 3.81% for AUC, 15% for SEN compared to the baseline GCN+ α FL (AUC: 0.7717, ACC: 0.8144, SEN: 0.7250, SPE: 0.8185, PPV: 0.1559, NPV: 0.9847), and an improvement of 5.88% in AUC and 22.5% compared to GCN+FL+SMOTE (AUC: 0.7510, ACC: 0.8431, SEN: 0.6500, SPE: 0.8520, PPV: 0.1688, NPV: 0.9814). It also showed a classification improvement of 3.86% for AUC, 15% for SEN compared to the baseline GCN+ α BCE (AUC: 0.7712, ACC: 0.8133, SEN: 0.7250, SPE: 0.8173, PPV: 0.1551, NPV: 0.9847), and an improvement of 14.33% in AUC and 27.5% in comparison to GCN+BCE+SMOTE (AUC: 0.6665, ACC: 0.7271, SEN: 0.6000, SPE: 0.7329, PPV: 0.0941, NPV: 0.9754).

Conclusion: When compared to all baseline models, MedMGF achieved the highest SEN and AUC results, demonstrating the potential for several healthcare applications.

Med-MGF:基于多层次图的医疗数据不平衡和代表性处理框架。
背景:患者数据建模,尤其是电子健康记录(EHR),是医疗领域机器学习研究的重点之一,因为这些记录为临床医生提供了宝贵的信息,有可能帮助他们进行疾病诊断和决策:在本研究中,我们提出了一个基于多层次图的框架,称为 MedMGF,该框架在单一架构中对从电子病历数据中提取的患者医疗档案及其健康档案关系网络进行建模。医疗档案由多层数据嵌入组成,这些数据嵌入来自住院期间获得的间隔记录,而患者-患者网络则是通过测量这些档案之间的相似性创建的。我们还提出了对焦点损失(FL)函数的修改,以提高不平衡数据集的分类性能,而无需对数据进行估算。我们利用二元交叉熵(BCE)、FL、类平衡参数α和合成少数群体过度采样技术(SMOTE)对多个图形卷积网络(GCN)基线模型进行了评估:我们提出的框架取得了很高的分类性能(AUC:0.8098, ACC:0.7503,SEN:0.8750,SPE:0.7445,NPV:0.9923,PPV:0.1367)。与基线 GCN+ α FL(AUC:0.7717, ACC:0.8144,SEN:0.7250,SPE:0.8185,PPV:0.1559,NPV:0.9847)相比,AUC提高了5.88%,SEN提高了22.5%(AUC:0.7510,ACC:0.8144,SEN:0.7250,SPE:0.8185,PPV:0.1559,NPV:0.9847):0.7510, ACC:AUC:0.7510,ACC:0.8431,SEN:0.6500,SPE:0.8520,PPV:0.1688,NPV:0.9814)。与基线 GCN+ α BCE 相比,它的 AUC 和 SEN 分别提高了 3.86% 和 15%(AUC:0.7712, ACC:0.8133,SEN:0.7250,SPE:0.8173,PPV:0.1551,NPV:0.9847),与 GCN+BCE+SMOTE 相比,AUC 提高了 14.33%,SEN 提高了 27.5%(AUC:0.6665,ACC:0.8133,SEN:0.7250,SPE:0.8173,PPV:0.1551,NPV:0.9847):0.7271,SEN结论是:与所有基线模型相比,Medicrometo™模型的AUC和NPV分别提高了14.33%和27.5%:与所有基线模型相比,MedMGF 的 SEN 值和 AUC 值最高,显示了其在多种医疗应用中的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信