Journal of Biomedical Informatics最新文献_第7页

From GPT to DeepSeek: Significant gaps remain in realizing AI in healthcare 从GPT到DeepSeek：在医疗保健领域实现人工智能仍有重大差距。

IF 4 2区医学

Journal of Biomedical Informatics Pub Date : 2025-03-01 DOI: 10.1016/j.jbi.2025.104791

Yifan Peng , Bradley A. Malin , Justin F. Rousseau , Yanshan Wang , Zihan Xu , Xuhai Xu , Chunhua Weng , Jiang Bian

引用次数: 0

ieGENES: A machine learning method for selecting differentially expressed genes in cancer studies ieGENES：在癌症研究中选择差异表达基因的机器学习方法。

IF 4 2区医学

Journal of Biomedical Informatics Pub Date : 2025-02-28 DOI: 10.1016/j.jbi.2025.104803

Xiao-Lei Xia , Shang-Ming Zhou , Yunguang Liu , Na Lin , Ian M. Overton

{"title":"ieGENES: A machine learning method for selecting differentially expressed genes in cancer studies","authors":"Xiao-Lei Xia , Shang-Ming Zhou , Yunguang Liu , Na Lin , Ian M. Overton","doi":"10.1016/j.jbi.2025.104803","DOIUrl":"10.1016/j.jbi.2025.104803","url":null,"abstract":"<div><div><strong>Gene selection</strong> is crucial for cancer classification using microarray data. In the interests of improving cancer classification accuracy, in this paper, we developed a new wrapper method called <strong><em>ieGENES</em> for gene selection</strong>. First we proposed a <strong>parsimonious kernel machine regularization (PKMR) model</strong> by using ridge regularization in kernel machine driven classification to tackle multi-collinearity for the sake of stable estimates in <strong>high-dimensional</strong> settings. Then the <em>ieGENES</em> algorithm was developed to <strong>optimally identify relevant genes</strong> while iteratively eliminating redundant ones based on leave-one-out cross-validation accuracy. In particular, we developed a new methodology to optimally update model parameters upon gene removal. The <em>ieGENES</em> algorithm was evaluated on six <strong>cancer microarray datasets</strong> and compared to existing methods. Classification accuracy and number of <strong>differentially expressed genes</strong> (DEGs) identified were assessed. In terms of gene selection accuracy, the <em>ieGENES</em> <strong>outperformed</strong> multiple wrapper methods on 5 out of 6 datasets (Colon, Leukemia, Hepato, Glioma, and Breast Cancers), with statistically significant improvements (<span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>001</mn></mrow></math></span>). For the Colon dataset, <em>ieGENES</em> achieved 96.21% accuracy with 167 DEGs. The proposed <em>ieGENES</em> technique demonstrated <strong>superior performance</strong> in identifying DEGs for cancer diagnosis comparing with existing techniques. It offers a promising tool for identifying <strong>biologically relevant genes</strong> in <strong>microarray data analysis</strong> and <strong>biomarker discovery</strong> for cancer research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"164 ","pages":"Article 104803"},"PeriodicalIF":4.0,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143537220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CECRel: A joint entity and relation extraction model for Chinese electronic medical records of coronary angiography via contrastive learning 基于对比学习的中国冠状动脉造影电子病历联合实体和关联提取模型

IF 4 2区医学

Journal of Biomedical Informatics Pub Date : 2025-02-24 DOI: 10.1016/j.jbi.2025.104792

Yetao Tong , Jijun Tong , Shudong Xia , Qingli Zhou , Yuqiang Shen

{"title":"CECRel: A joint entity and relation extraction model for Chinese electronic medical records of coronary angiography via contrastive learning","authors":"Yetao Tong , Jijun Tong , Shudong Xia , Qingli Zhou , Yuqiang Shen","doi":"10.1016/j.jbi.2025.104792","DOIUrl":"10.1016/j.jbi.2025.104792","url":null,"abstract":"<div><div>Entity and relation extraction from Chinese electronic medical records (EMRs) is a crucial foundation for constructing medical knowledge graphs and supporting downstream tasks. Chinese EMRs face challenges in accurately extracting medical entity relations due to limited data and the complexity of overlapping medical relationships. We propose CECRel, a joint extraction model for Chinese EMR entity relations based on contrastive learning and feature enhancement to address this issue. CECRel employs data augmentation strategies to generate positive and negative samples for contrastive loss computation and utilizes a feature enhancement module to enrich textual structural features, enabling the accurate extraction of complex relational triples. Experiments conducted on our constructed dataset, CACMeD, demonstrated that the model achieves an accuracy of 80.56%, a recall of 74.69%, and an F1 score of 77.51%. Furthermore, in the Baidu DuIE dataset, the model achieved an accuracy of 79.71%, a recall of 74.14%, and an F1 score of 76.82%, demonstrating that the proposed model is competitive among existing models.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"164 ","pages":"Article 104792"},"PeriodicalIF":4.0,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143488371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Network-based analysis of Alzheimer’s Disease genes using multi-omics network integration with graph diffusion 基于图扩散的多组学网络集成的阿尔茨海默病基因网络分析。

IF 4 2区医学

Journal of Biomedical Informatics Pub Date : 2025-02-22 DOI: 10.1016/j.jbi.2025.104797

Softya Sebastian , Swarup Roy , Jugal Kalita

{"title":"Network-based analysis of Alzheimer’s Disease genes using multi-omics network integration with graph diffusion","authors":"Softya Sebastian , Swarup Roy , Jugal Kalita","doi":"10.1016/j.jbi.2025.104797","DOIUrl":"10.1016/j.jbi.2025.104797","url":null,"abstract":"<div><div>Alzheimer’s Disease (AD) is a complex neurodegenerative disorder affecting millions worldwide. Despite extensive research, the mechanisms behind AD remain elusive. Many studies suggest that disease-responsible genes often act as hub genes in biological networks. However, this assumption requires further investigation in the context of AD. To examine the network characteristics of known AD genes, it is crucial to construct a highly confident network, which is challenging to achieve using a single data source. This work integrates multi-omics networks inferred from microarray, single-cell RNA sequencing, and single-nuclei RNA sequencing expression data, weighted with protein interaction and gene ontology information. We generate a high-quality integrated network by utilizing various inference methods and combining them through a graph diffusion-based integration approach. This network is then analyzed to investigate the properties of known AD-specific genes. Our findings reveal that AD genes are not always high-degree or central hub nodes in the network. Instead, these genes are distributed across different quartiles of degree centrality while maintaining significant interconnections for effective regulation. Furthermore, our study highlights that peripheral genes, often overlooked, also play crucial roles by connecting to relevant disease nodes and hub genes. These findings challenge the conventional understanding that AD-responsible genes are primarily the hub genes in the network, offering new insights into the complex regulatory mechanisms of AD and suggesting novel directions for future research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"164 ","pages":"Article 104797"},"PeriodicalIF":4.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143492050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive validation study on the influencing factors of cough-based COVID-19 detection through multi-center data with abundant metadata 利用丰富元数据的多中心数据对基于咳嗽的COVID-19检测影响因素进行综合验证研究。

IF 4 2区医学

Journal of Biomedical Informatics Pub Date : 2025-02-22 DOI: 10.1016/j.jbi.2025.104798

Jiakun Shen , Xueshuai Zhang , Yanfen Tang , Pengyuan Zhang , Yonghong Yan , Pengfei Ye , Shaoxing Zhang , Zhihua Huang

{"title":"A comprehensive validation study on the influencing factors of cough-based COVID-19 detection through multi-center data with abundant metadata","authors":"Jiakun Shen , Xueshuai Zhang , Yanfen Tang , Pengyuan Zhang , Yonghong Yan , Pengfei Ye , Shaoxing Zhang , Zhihua Huang","doi":"10.1016/j.jbi.2025.104798","DOIUrl":"10.1016/j.jbi.2025.104798","url":null,"abstract":"<div><h3>Objective:</h3><div>In recent years, COVID-19 has placed enormous burdens on healthcare systems. Currently, hundreds of thousands of new cases are reported monthly. World Health Organization is managing COVID-19 as a long-term disease, indicating that an efficient and low-cost detection method remains necessary. Previous studies have shown competitive results on cough-based COVID-19 detection combined with deep learning methods. However, most studies have focused only on improving classification performance on single-source data while neglecting the impact of various factors in real-world applications.</div></div><div><h3>Methods:</h3><div>To this end, we collected clinical and large-scale crowdsourced cough audios with abundant metadata to comprehensively validate the performance differences among different groups. Specifically, we leveraged self-supervised learning for pre-training and fine-tuned the model with data from different sources. Then based on the metadata, we compared the effects of factors such as cough types, symptoms, and infection stages on detection performance. Moreover, we recorded clinical indicators of viral load and antibody levels and observed the correlation between predicted probabilities and indicator values for the first time. Several open-source datasets were tested to verify the model generalizability.</div></div><div><h3>Results:</h3><div>The area under receiver operating characteristic curve is 0.79 for clinical data and 0.69 for crowdsourced data, indicating differences between clinical validation and real-world application. The performance in detecting symptomatic COVID-19 subjects is usually better than detecting asymptomatic COVID-19 subjects. The prediction results show weak correlation with clinical indicators on a small number of clinical data. Poor detection performance in recovery individuals and open-source datasets shows a limitation of existing cough-based detection models.</div></div><div><h3>Conclusion:</h3><div>Our study validated the model performance and limitations using multi-source data with abundant metadata, which helped researchers evaluate the feasibility of cough-based COVID-19 detection model in practical applications.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"164 ","pages":"Article 104798"},"PeriodicalIF":4.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143492047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Missing-modality enabled multi-modal fusion architecture for medical data 支持缺失模态的医疗数据多模态融合架构

IF 4 2区医学

Journal of Biomedical Informatics Pub Date : 2025-02-21 DOI: 10.1016/j.jbi.2025.104796

Muyu Wang , Shiyu Fan , Yichen Li , Zhongrang Xie , Hui Chen

{"title":"Missing-modality enabled multi-modal fusion architecture for medical data","authors":"Muyu Wang , Shiyu Fan , Yichen Li , Zhongrang Xie , Hui Chen","doi":"10.1016/j.jbi.2025.104796","DOIUrl":"10.1016/j.jbi.2025.104796","url":null,"abstract":"<div><h3>Background</h3><div>Fusion of multi-modal data can improve the performance of deep learning models. However, missing modalities are common in medical data due to patient specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities.</div></div><div><h3>Objective</h3><div>This study aimed to develop an effective multi-modal fusion architecture for medical data that was robust to missing modalities and further improved the performance for clinical tasks.</div></div><div><h3>Methods</h3><div>X-ray chest radiographs for the image modality, radiology reports for the text modality, and structured value data for the tabular data modality were fused in this study. Each modality pair was fused with a Transformer-based bi-modal fusion module, and the three bi-modal fusion modules were then combined into a tri-modal fusion framework. Additionally, multivariate loss functions were introduced into the training process to improve models’ robustness to missing modalities during the inference process. Finally, we designed comparison and ablation experiments to validate the effectiveness of the fusion, the robustness to missing modalities, and the enhancements from each key component. Experiments were conducted on MIMIC-IV and MIMIC-CXR datasets with the 14-label disease diagnosis and patient in-hospital mortality prediction task The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) were used to evaluate models’ performance.</div></div><div><h3>Results</h3><div>Our proposed architecture showed superior predictive performance, achieving the average AUROC and AUPRC of 0.916 and 0.551 in the 14-label classification task, 0.816 and 0.392 in the mortality prediction task. while the best average AUROC and AUPRC among the comparison methods were 0.876, 0.492 in the 14-label classification task and 0.806, 0.366 in the mortality prediction task. Both metrics decreased only slightly when tested with modal-incomplete data. Different levels of enhancements were achieved through three key components.</div></div><div><h3>Conclusions</h3><div>The proposed multi-modal fusion architecture effectively fused three modalities and showed strong robustness to missing modalities. This architecture holds promise for scaling up to more modalities to enhance the clinical practicality of the model.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"164 ","pages":"Article 104796"},"PeriodicalIF":4.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Imputation of missing aggregate EHR audit log data across individual and multiple organizations 跨个人和多个组织的缺失汇总EHR审计日志数据的输入

IF 4 2区医学

Journal of Biomedical Informatics Pub Date : 2025-02-18 DOI: 10.1016/j.jbi.2025.104805

Huan Li , Nate C. Apathy , A Jay Holmgren , Edward R. Melnick , Robert A. McDougal

{"title":"Imputation of missing aggregate EHR audit log data across individual and multiple organizations","authors":"Huan Li , Nate C. Apathy , A Jay Holmgren , Edward R. Melnick , Robert A. McDougal","doi":"10.1016/j.jbi.2025.104805","DOIUrl":"10.1016/j.jbi.2025.104805","url":null,"abstract":"<div><h3>Objective</h3><div>To compare naive versus machine learning imputation strategies’ efficacy for imputing missing data in EHR-vendor generated data, explore subgrouping criteria, and evaluate performance and feasibility for in-house implementation.</div></div><div><h3>Materials and Methods</h3><div>Missing data imputation experiments involving various types and sizes of organizations were conducted using physician-only aggregate EHR audit log data. Organizations were categorized by teaching status. Based on the coefficient of variation and missing percentage, variables were classified into three categories before imputation. The model with the highest R<sup>2</sup>-value was selected as the most robust option.</div></div><div><h3>Results</h3><div>Teaching and non-teaching organizations showed similar R<sup>2</sup> trends in model selection, though some differences existed within each class. Moreover, the rolling average provided more consistent R<sup>2</sup> results across various organization sizes, especially for medium and small-sized organizations. XGBoost performed slightly better in large organizations than in small organizations. Comparisons between single- and multi-site organizations revealed higher R<sup>2</sup>-values for single organizations using their own data for imputation as opposed to merging.</div></div><div><h3>Discussion/Conclusion:</h3><div>The study introduced a systematic method for classifying variables and determining the best imputation strategy for each class. It also tested the scalability of this approach for individual organizations. Organizations can effectively use this method, including variable classification and tailored imputation methods. Organization size did not significantly affect the imputation process. The rolling average time-series method outperformed the machine learning method, which used non-time-series approaches. Combining data from diverse sites does not necessarily improve machine learning imputation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104805"},"PeriodicalIF":4.0,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multimodal machine learning algorithm improved diagnostic accuracy for otitis media in a school aged Aboriginal population 一种多模态机器学习算法提高了对学龄原住民中耳炎的诊断准确性。

IF 4 2区医学

Journal of Biomedical Informatics Pub Date : 2025-02-17 DOI: 10.1016/j.jbi.2025.104801

Jacqueline H. Stephens , Phong Phu Nguyen , Amanda Machell , Linnett Sanchez , Eng H. Ooi , A. Simon Carney , Trent Lewis

{"title":"A multimodal machine learning algorithm improved diagnostic accuracy for otitis media in a school aged Aboriginal population","authors":"Jacqueline H. Stephens , Phong Phu Nguyen , Amanda Machell , Linnett Sanchez , Eng H. Ooi , A. Simon Carney , Trent Lewis","doi":"10.1016/j.jbi.2025.104801","DOIUrl":"10.1016/j.jbi.2025.104801","url":null,"abstract":"<div><h3>Objective</h3><div>Otitis Media (OM) – ear infection – can lead to hearing loss and associated developmental delay. There are several subgroups of OM which can be difficult to diagnose accurately, even for experienced clinicians. AI and machine learning algorithms for OM diagnosis are evolving but typically only focus on one defined diagnostic feature of OM. This study aimed to establish if combining otoscopic and tympanometry data improves the diagnostic accuracy of a ML algorithm for diagnosing OM and its various subgroups.</div></div><div><h3>Methods</h3><div>We used an existing dataset containing data from 813 school-aged children (aged five to eight years) from 10 Aboriginal communities in remote South Australia. Data were collected between 2009 and 2011. All children underwent video otoscopy and tympanometry assessment of both ears and diagnosis of OM was made by otorhinolaryngology (ENT) surgeons. After data augmentation and preprocessing, the database contained 15,057 samples with matched video otoscopy and tympanometry data (normal: n = 8,239; abnormal: n = 6,746). Support Vector Machine models were used to build the ML system.</div></div><div><h3>Results</h3><div>By combining tympanometry data with the probability prediction of the single otoscopy model, the accuracy of the system increased from 78 % (otoscopy data) to 82 % (otoscopy and tympanometry data).</div></div><div><h3>Conclusion</h3><div>Compared to diagnosis based solely on otoscopy data, combining otoscopy and tympanometry data increased the diagnostic accuracy of the ML algorithm. This approach could be used to support the accurate diagnosis of OM in children and can facilitate timely and appropriate treatment, especially in rural and remote areas.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"164 ","pages":"Article 104801"},"PeriodicalIF":4.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143458075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Developing libraries of semantically-augmented graphics as visual standards for biomedical information systems 开发语义增强图形库作为生物医学信息系统的视觉标准。

IF 4 2区医学

Journal of Biomedical Informatics Pub Date : 2025-02-16 DOI: 10.1016/j.jbi.2025.104804

Melissa D. Clarkson, Steven Roggenkamp, Landon T. Detwiler

{"title":"Developing libraries of semantically-augmented graphics as visual standards for biomedical information systems","authors":"Melissa D. Clarkson, Steven Roggenkamp, Landon T. Detwiler","doi":"10.1016/j.jbi.2025.104804","DOIUrl":"10.1016/j.jbi.2025.104804","url":null,"abstract":"<div><h3>Objective</h3><div>Visual representations generally serve as supplements to information, rather than as bearers of computable information themselves. Our objective is to develop a method for creating semantically-augmented graphic libraries that will serve as visual standards and can be implemented as visual assets in intelligent information systems.</div></div><div><h3>Methods</h3><div>Graphics were developed using a composable approach and specified using SVG. OWL was used to represent the entities of our system, which include elements, units, graphics, graphic libraries, and library collections. A graph database serves as our data management system. Semantics are applied at multiple levels: (a) each element is associated with a semantic style class to link visual style to semantic meaning, (b) graphics are described using object properties and data properties, (c) relationships are specified between graphics, and (d) mappings are made between the graphics and outside resources.</div></div><div><h3>Results</h3><div>The Graphic Library web application enables users to browse the libraries, view information pages for each graphic, and download individual graphics. We demonstrate how SPARQL can be employed to query the graphics database and the APIs can be used to retrieve the graphics and associated data for applications. In addition, this work shows that our method of designing composable graphics is well-suited to depicting variations in human anatomy.</div></div><div><h3>Conclusion</h3><div>This work provides a bridge between visual communication and the field of knowledge representation. We demonstrate a method for creating visual standards that are compatible with practices in biomedical ontology and implement a system for making them accessible to information systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104804"},"PeriodicalIF":4.0,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143440981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced heart failure mortality prediction through model-independent hybrid feature selection and explainable machine learning 通过模型无关的混合特征选择和可解释的机器学习增强心力衰竭死亡率预测

IF 4 2区医学

Journal of Biomedical Informatics Pub Date : 2025-02-15 DOI: 10.1016/j.jbi.2025.104800

Georgios Petmezas , Vasileios E. Papageorgiou , Vassilios Vassilikos , Efstathios Pagourelias , Dimitrios Tachmatzidis , George Tsaklidis , Aggelos K. Katsaggelos , Nicos Maglaveras

{"title":"Enhanced heart failure mortality prediction through model-independent hybrid feature selection and explainable machine learning","authors":"Georgios Petmezas , Vasileios E. Papageorgiou , Vassilios Vassilikos , Efstathios Pagourelias , Dimitrios Tachmatzidis , George Tsaklidis , Aggelos K. Katsaggelos , Nicos Maglaveras","doi":"10.1016/j.jbi.2025.104800","DOIUrl":"10.1016/j.jbi.2025.104800","url":null,"abstract":"<div><div>Heart failure (HF) remains a significant public health challenge with high mortality rates. Machine learning (ML) techniques offer a promising approach to predict HF mortality, potentially improving clinical outcomes. However, the effectiveness of these techniques heavily depends on the quality and relevance of the features used. This study introduces a novel hybrid feature selection methodology that combines Extremely Randomized Trees (Extra-Trees) and non-linear correlation measures to enhance 1-year all-cause mortality prediction in HF patients using echocardiographic and key demographic data. Unlike existing feature selection methods that are often tied to specific ML models and produce inconsistent feature sets across different algorithms, our proposed approach is model-independent, ensuring robustness and generalizability. Moreover, the optimal number of predictive features is identified through loss graph inspection, leading to a compact and highly informative subset of seven features. We trained and evaluated seven widely-used ML models on both the full feature set and the selected subset, finding that most models maintained or improved their predictive performance despite an 80% reduction in features. Model interpretability was enhanced using SHapley Additive exPlanations (SHAP), allowing for a detailed examination of how individual features influence predictions. To further assess its effectiveness, we compared our methodology against widely known feature selection techniques across all seven ML models. The results underscore the superiority of our proposed feature set in accurately predicting HF mortality over conventional methods, offering new opportunities for personalized management strategies based on a streamlined and explainable feature subset.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104800"},"PeriodicalIF":4.0,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143421607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0