Feature selection integrating Shapley values and mutual information in reinforcement learning: An application in the prediction of post-operative outcomes in patients with end-stage renal disease
IF 4.9 2区 医学Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
{"title":"Feature selection integrating Shapley values and mutual information in reinforcement learning: An application in the prediction of post-operative outcomes in patients with end-stage renal disease","authors":"","doi":"10.1016/j.cmpb.2024.108416","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>In predicting post-operative outcomes for patients with end-stage renal disease, our study faced challenges related to class imbalance and a high-dimensional feature space. Therefore, with a focus on overcoming class imbalance and improving interpretability, we propose a novel feature selection approach using multi-agent reinforcement learning.</div></div><div><h3>Methods:</h3><div>We proposed a multi-agent feature selection model based on a comprehensive reward function that combines classification model performance, Shapley additive explanations values, and the mutual information. The definition of rewards in reinforcement learning is crucial for model convergence and performance improvement. Initially, we set a deterministic reward based on the mutual information between variables and the target class, selecting variables that are highly dependent on the class, thus accelerating convergence. We then prioritized variables that influence the minority class on a sample basis and introduced a dynamic reward distribution strategy using Shapley additive explanations values to improve interpretability and solve the class imbalance problem.</div></div><div><h3>Results:</h3><div>Involving the integration of electronic medical records, anesthesia records, operating room vital signs, and pre-operative anesthesia evaluations, our approach effectively mitigated class imbalance and demonstrated superior performance in ablation analysis. Our model achieved a 16% increase in the minority class F1 score and an 8.2% increase in the overall F1 score compared to the baseline model without feature selection.</div></div><div><h3>Conclusion:</h3><div>This study contributes important research findings that show that the multi-agent-based feature selection method can be a promising approach for solving the class imbalance problem.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":null,"pages":null},"PeriodicalIF":4.9000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724004097","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background:
In predicting post-operative outcomes for patients with end-stage renal disease, our study faced challenges related to class imbalance and a high-dimensional feature space. Therefore, with a focus on overcoming class imbalance and improving interpretability, we propose a novel feature selection approach using multi-agent reinforcement learning.
Methods:
We proposed a multi-agent feature selection model based on a comprehensive reward function that combines classification model performance, Shapley additive explanations values, and the mutual information. The definition of rewards in reinforcement learning is crucial for model convergence and performance improvement. Initially, we set a deterministic reward based on the mutual information between variables and the target class, selecting variables that are highly dependent on the class, thus accelerating convergence. We then prioritized variables that influence the minority class on a sample basis and introduced a dynamic reward distribution strategy using Shapley additive explanations values to improve interpretability and solve the class imbalance problem.
Results:
Involving the integration of electronic medical records, anesthesia records, operating room vital signs, and pre-operative anesthesia evaluations, our approach effectively mitigated class imbalance and demonstrated superior performance in ablation analysis. Our model achieved a 16% increase in the minority class F1 score and an 8.2% increase in the overall F1 score compared to the baseline model without feature selection.
Conclusion:
This study contributes important research findings that show that the multi-agent-based feature selection method can be a promising approach for solving the class imbalance problem.
研究背景在预测终末期肾病患者术后预后时,我们的研究面临着类不平衡和高维特征空间的挑战。因此,为了克服类不平衡和提高可解释性,我们提出了一种使用多代理强化学习的新型特征选择方法:我们提出了一种基于综合奖励函数的多代理特征选择模型,该函数结合了分类模型性能、夏普利加法解释值和互信息。强化学习中奖励的定义对模型收敛和性能改进至关重要。起初,我们根据变量与目标类别之间的互信息设置确定性奖励,选择与类别高度相关的变量,从而加速收敛。然后,我们在样本的基础上对影响少数类的变量进行优先排序,并采用 Shapley 加法解释值引入动态奖励分配策略,以提高可解释性并解决类不平衡问题:我们的方法整合了电子病历、麻醉记录、手术室生命体征和术前麻醉评估,有效缓解了类失衡问题,并在消融分析中表现出卓越的性能。与未进行特征选择的基线模型相比,我们的模型使少数类别的 F1 分数提高了 16%,整体 F1 分数提高了 8.2%:本研究提供了重要的研究成果,表明基于多代理的特征选择方法是解决类不平衡问题的一种有前途的方法。
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.