Med-MGF：基于多层次图的医疗数据不平衡和代表性处理框架。

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2024-09-02 DOI:10.1186/s12911-024-02649-2

Tuong Minh Nguyen, Kim Leng Poh, Shu-Ling Chong, Jan Hau Lee

{"title":"Med-MGF：基于多层次图的医疗数据不平衡和代表性处理框架。","authors":"Tuong Minh Nguyen, Kim Leng Poh, Shu-Ling Chong, Jan Hau Lee","doi":"10.1186/s12911-024-02649-2","DOIUrl":null,"url":null,"abstract":"Background: Modeling patient data, particularly electronic health records (EHR), is one of the major focuses of machine learning studies in healthcare, as these records provide clinicians with valuable information that can potentially assist them in disease diagnosis and decision-making.Methods: In this study, we present a multi-level graph-based framework called MedMGF, which models both patient medical profiles extracted from EHR data and their relationship network of health profiles in a single architecture. The medical profiles consist of several layers of data embedding derived from interval records obtained during hospitalization, and the patient-patient network is created by measuring the similarities between these profiles. We also propose a modification to the Focal Loss (FL) function to improve classification performance in imbalanced datasets without the need to imputate the data. MedMGF's performance was evaluated against several Graphical Convolutional Network (GCN) baseline models implemented with Binary Cross Entropy (BCE), FL, class balancing parameter <math><mi>α</mi></math> , and Synthetic Minority Oversampling Technique (SMOTE).Results: Our proposed framework achieved high classification performance (AUC: 0.8098, ACC: 0.7503, SEN: 0.8750, SPE: 0.7445, NPV: 0.9923, PPV: 0.1367) on an extreme imbalanced pediatric sepsis dataset (n=3,014, imbalance ratio of 0.047). It yielded a classification improvement of 3.81% for AUC, 15% for SEN compared to the baseline GCN+ <math><mi>α</mi></math> FL (AUC: 0.7717, ACC: 0.8144, SEN: 0.7250, SPE: 0.8185, PPV: 0.1559, NPV: 0.9847), and an improvement of 5.88% in AUC and 22.5% compared to GCN+FL+SMOTE (AUC: 0.7510, ACC: 0.8431, SEN: 0.6500, SPE: 0.8520, PPV: 0.1688, NPV: 0.9814). It also showed a classification improvement of 3.86% for AUC, 15% for SEN compared to the baseline GCN+ <math><mi>α</mi></math> BCE (AUC: 0.7712, ACC: 0.8133, SEN: 0.7250, SPE: 0.8173, PPV: 0.1551, NPV: 0.9847), and an improvement of 14.33% in AUC and 27.5% in comparison to GCN+BCE+SMOTE (AUC: 0.6665, ACC: 0.7271, SEN: 0.6000, SPE: 0.7329, PPV: 0.0941, NPV: 0.9754).Conclusion: When compared to all baseline models, MedMGF achieved the highest SEN and AUC results, demonstrating the potential for several healthcare applications.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"242"},"PeriodicalIF":3.3000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11367907/pdf/","citationCount":"0","resultStr":"{\"title\":\"Med-MGF: multi-level graph-based framework for handling medical data imbalance and representation.\",\"authors\":\"Tuong Minh Nguyen, Kim Leng Poh, Shu-Ling Chong, Jan Hau Lee\",\"doi\":\"10.1186/s12911-024-02649-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Modeling patient data, particularly electronic health records (EHR), is one of the major focuses of machine learning studies in healthcare, as these records provide clinicians with valuable information that can potentially assist them in disease diagnosis and decision-making.Methods: In this study, we present a multi-level graph-based framework called MedMGF, which models both patient medical profiles extracted from EHR data and their relationship network of health profiles in a single architecture. The medical profiles consist of several layers of data embedding derived from interval records obtained during hospitalization, and the patient-patient network is created by measuring the similarities between these profiles. We also propose a modification to the Focal Loss (FL) function to improve classification performance in imbalanced datasets without the need to imputate the data. MedMGF's performance was evaluated against several Graphical Convolutional Network (GCN) baseline models implemented with Binary Cross Entropy (BCE), FL, class balancing parameter <math><mi>α</mi></math> , and Synthetic Minority Oversampling Technique (SMOTE).Results: Our proposed framework achieved high classification performance (AUC: 0.8098, ACC: 0.7503, SEN: 0.8750, SPE: 0.7445, NPV: 0.9923, PPV: 0.1367) on an extreme imbalanced pediatric sepsis dataset (n=3,014, imbalance ratio of 0.047). It yielded a classification improvement of 3.81% for AUC, 15% for SEN compared to the baseline GCN+ <math><mi>α</mi></math> FL (AUC: 0.7717, ACC: 0.8144, SEN: 0.7250, SPE: 0.8185, PPV: 0.1559, NPV: 0.9847), and an improvement of 5.88% in AUC and 22.5% compared to GCN+FL+SMOTE (AUC: 0.7510, ACC: 0.8431, SEN: 0.6500, SPE: 0.8520, PPV: 0.1688, NPV: 0.9814). It also showed a classification improvement of 3.86% for AUC, 15% for SEN compared to the baseline GCN+ <math><mi>α</mi></math> BCE (AUC: 0.7712, ACC: 0.8133, SEN: 0.7250, SPE: 0.8173, PPV: 0.1551, NPV: 0.9847), and an improvement of 14.33% in AUC and 27.5% in comparison to GCN+BCE+SMOTE (AUC: 0.6665, ACC: 0.7271, SEN: 0.6000, SPE: 0.7329, PPV: 0.0941, NPV: 0.9754).Conclusion: When compared to all baseline models, MedMGF achieved the highest SEN and AUC results, demonstrating the potential for several healthcare applications.\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"24 1\",\"pages\":\"242\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11367907/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-024-02649-2\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02649-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：患者数据建模，尤其是电子健康记录（EHR），是医疗领域机器学习研究的重点之一，因为这些记录为临床医生提供了宝贵的信息，有可能帮助他们进行疾病诊断和决策：在本研究中，我们提出了一个基于多层次图的框架，称为 MedMGF，该框架在单一架构中对从电子病历数据中提取的患者医疗档案及其健康档案关系网络进行建模。医疗档案由多层数据嵌入组成，这些数据嵌入来自住院期间获得的间隔记录，而患者-患者网络则是通过测量这些档案之间的相似性创建的。我们还提出了对焦点损失（FL）函数的修改，以提高不平衡数据集的分类性能，而无需对数据进行估算。我们利用二元交叉熵（BCE）、FL、类平衡参数α和合成少数群体过度采样技术（SMOTE）对多个图形卷积网络（GCN）基线模型进行了评估：我们提出的框架取得了很高的分类性能（AUC：0.8098, ACC：0.7503，SEN：0.8750，SPE：0.7445，NPV：0.9923，PPV：0.1367）。与基线 GCN+ α FL（AUC：0.7717, ACC：0.8144，SEN：0.7250，SPE：0.8185，PPV：0.1559，NPV：0.9847）相比，AUC提高了5.88%，SEN提高了22.5%（AUC：0.7510，ACC：0.8144，SEN：0.7250，SPE：0.8185，PPV：0.1559，NPV：0.9847）：0.7510, ACC：AUC：0.7510，ACC：0.8431，SEN：0.6500，SPE：0.8520，PPV：0.1688，NPV：0.9814）。与基线 GCN+ α BCE 相比，它的 AUC 和 SEN 分别提高了 3.86% 和 15%（AUC：0.7712, ACC：0.8133，SEN：0.7250，SPE：0.8173，PPV：0.1551，NPV：0.9847），与 GCN+BCE+SMOTE 相比，AUC 提高了 14.33%，SEN 提高了 27.5%（AUC：0.6665，ACC：0.8133，SEN：0.7250，SPE：0.8173，PPV：0.1551，NPV：0.9847）：0.7271，SEN结论是：与所有基线模型相比，Medicrometo™模型的AUC和NPV分别提高了14.33%和27.5%：与所有基线模型相比，MedMGF 的 SEN 值和 AUC 值最高，显示了其在多种医疗应用中的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Med-MGF: multi-level graph-based framework for handling medical data imbalance and representation.

Background: Modeling patient data, particularly electronic health records (EHR), is one of the major focuses of machine learning studies in healthcare, as these records provide clinicians with valuable information that can potentially assist them in disease diagnosis and decision-making.

Methods: In this study, we present a multi-level graph-based framework called MedMGF, which models both patient medical profiles extracted from EHR data and their relationship network of health profiles in a single architecture. The medical profiles consist of several layers of data embedding derived from interval records obtained during hospitalization, and the patient-patient network is created by measuring the similarities between these profiles. We also propose a modification to the Focal Loss (FL) function to improve classification performance in imbalanced datasets without the need to imputate the data. MedMGF's performance was evaluated against several Graphical Convolutional Network (GCN) baseline models implemented with Binary Cross Entropy (BCE), FL, class balancing parameter $α$ , and Synthetic Minority Oversampling Technique (SMOTE).

Results: Our proposed framework achieved high classification performance (AUC: 0.8098, ACC: 0.7503, SEN: 0.8750, SPE: 0.7445, NPV: 0.9923, PPV: 0.1367) on an extreme imbalanced pediatric sepsis dataset (n=3,014, imbalance ratio of 0.047). It yielded a classification improvement of 3.81% for AUC, 15% for SEN compared to the baseline GCN+ $α$ FL (AUC: 0.7717, ACC: 0.8144, SEN: 0.7250, SPE: 0.8185, PPV: 0.1559, NPV: 0.9847), and an improvement of 5.88% in AUC and 22.5% compared to GCN+FL+SMOTE (AUC: 0.7510, ACC: 0.8431, SEN: 0.6500, SPE: 0.8520, PPV: 0.1688, NPV: 0.9814). It also showed a classification improvement of 3.86% for AUC, 15% for SEN compared to the baseline GCN+ $α$ BCE (AUC: 0.7712, ACC: 0.8133, SEN: 0.7250, SPE: 0.8173, PPV: 0.1551, NPV: 0.9847), and an improvement of 14.33% in AUC and 27.5% in comparison to GCN+BCE+SMOTE (AUC: 0.6665, ACC: 0.7271, SEN: 0.6000, SPE: 0.7329, PPV: 0.0941, NPV: 0.9754).

Conclusion: When compared to all baseline models, MedMGF achieved the highest SEN and AUC results, demonstrating the potential for several healthcare applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.