Mohammad Yaseliani , Md. Noor-E-Alam , Osama Dasa , Xiaochen Xian , Carl J. Pepine , Md Mahmudul Hasan
{"title":"A lightweight graph neural network to predict long-term mortality in coronary artery disease patients: an interpretable causality-aware approach","authors":"Mohammad Yaseliani , Md. Noor-E-Alam , Osama Dasa , Xiaochen Xian , Carl J. Pepine , Md Mahmudul Hasan","doi":"10.1016/j.jbi.2025.104846","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Coronary artery disease (CAD) causes substantial death toll in the United States and worldwide. While traditional methods for CAD mortality prediction are based on established risk factors, they have significant limitations in accuracy, adaptability to diverse populations, performance for individual risk prediction compared to group data, and incorporation of socioeconomic and lifestyle variations. Machine learning (ML) models have demonstrated superior performance in CAD prediction; however, they often struggle with capturing complex data interactions that can impact mortality.</div></div><div><h3>Methods</h3><div>We proposed lightweight, interpretable graph neural network (GNN) models, utilizing data from a large trial of hypertensive patients with CAD to predict mortality using a concise set of critical features. While this smaller set of features can improve efficiency and implementation in clinical settings, the model’s “lightweight” nature facilitates fast real-time applications. We utilized a hybrid approach, which first uses logistic regression (LR) to identify statistically significant features, followed by propensity score matching (PSM) to identify potentially causal features. These causal features, alongside demographic variables, were employed to create a graph of patients, drawing edges between patients with similar causal features. Accordingly, lightweight 5-layer graph convolutional network (GCN) and graph attention network (GAT) were designed for mortality prediction, followed by an interpretable method (i.e., GNNExplainer) to report the feature importance.</div></div><div><h3>Results</h3><div>The proposed GCN achieved a recall of 93.02% and a negative predictive value (NPV) of 89.42%, higher than all other classifiers. Accordingly, a web-based decision support system (DSS), called CAD-SS, was developed, capable of predicting mortality and identifying risk factors and similar patients, guiding clinicians in reliable and informed decision-making.</div></div><div><h3>Conclusions</h3><div>Our proposed CAD-SS, which utilizes an interpretable and causality-aware lightweight GCN model, demonstrated reasonably high performance in predicting mortality due to CAD. This unique system can help identify the most vulnerable patients.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"167 ","pages":"Article 104846"},"PeriodicalIF":4.5000,"publicationDate":"2025-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425000759","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Coronary artery disease (CAD) causes substantial death toll in the United States and worldwide. While traditional methods for CAD mortality prediction are based on established risk factors, they have significant limitations in accuracy, adaptability to diverse populations, performance for individual risk prediction compared to group data, and incorporation of socioeconomic and lifestyle variations. Machine learning (ML) models have demonstrated superior performance in CAD prediction; however, they often struggle with capturing complex data interactions that can impact mortality.
Methods
We proposed lightweight, interpretable graph neural network (GNN) models, utilizing data from a large trial of hypertensive patients with CAD to predict mortality using a concise set of critical features. While this smaller set of features can improve efficiency and implementation in clinical settings, the model’s “lightweight” nature facilitates fast real-time applications. We utilized a hybrid approach, which first uses logistic regression (LR) to identify statistically significant features, followed by propensity score matching (PSM) to identify potentially causal features. These causal features, alongside demographic variables, were employed to create a graph of patients, drawing edges between patients with similar causal features. Accordingly, lightweight 5-layer graph convolutional network (GCN) and graph attention network (GAT) were designed for mortality prediction, followed by an interpretable method (i.e., GNNExplainer) to report the feature importance.
Results
The proposed GCN achieved a recall of 93.02% and a negative predictive value (NPV) of 89.42%, higher than all other classifiers. Accordingly, a web-based decision support system (DSS), called CAD-SS, was developed, capable of predicting mortality and identifying risk factors and similar patients, guiding clinicians in reliable and informed decision-making.
Conclusions
Our proposed CAD-SS, which utilizes an interpretable and causality-aware lightweight GCN model, demonstrated reasonably high performance in predicting mortality due to CAD. This unique system can help identify the most vulnerable patients.
期刊介绍:
The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.