Maria Carmela Groccia, Rosita Guido, Domenico Conforti, Corrado Pelaia, Giuseppe Armentaro, Alfredo Francesco Toscani, Sofia Miceli, Elena Succurro, Marta Letizia Hribal, Angela Sciacqua
{"title":"Cost-Sensitive Models to Predict Risk of Cardiovascular Events in Patients with Chronic Heart Failure","authors":"Maria Carmela Groccia, Rosita Guido, Domenico Conforti, Corrado Pelaia, Giuseppe Armentaro, Alfredo Francesco Toscani, Sofia Miceli, Elena Succurro, Marta Letizia Hribal, Angela Sciacqua","doi":"10.3390/info14100542","DOIUrl":null,"url":null,"abstract":"Chronic heart failure (CHF) is a clinical syndrome characterised by symptoms and signs due to structural and/or functional abnormalities of the heart. CHF confers risk for cardiovascular deterioration events which cause recurrent hospitalisations and high mortality rates. The early prediction of these events is very important to limit serious consequences, improve the quality of care, and reduce its burden. CHF is a progressive condition in which patients may remain asymptomatic before the onset of symptoms, as observed in heart failure with a preserved ejection fraction. The early detection of underlying causes is critical for treatment optimisation and prognosis improvement. To develop models to predict cardiovascular deterioration events in patients with chronic heart failure, a real dataset was constructed and a knowledge discovery task was implemented in this study. The dataset is imbalanced, as it is common in real-world applications. It thus posed a challenge because imbalanced datasets tend to be overwhelmed by the abundance of majority-class instances during the learning process. To address the issue, a pipeline was developed specifically to handle imbalanced data. Different predictive models were developed and compared. To enhance sensitivity and other performance metrics, we employed multiple approaches, including data resampling, cost-sensitive methods, and a hybrid method that combines both techniques. These methods were utilised to assess the predictive capabilities of the models and their effectiveness in handling imbalanced data. By using these metrics, we aimed to identify the most effective strategies for achieving improved model performance in real scenarios with imbalanced datasets. The best model for predicting cardiovascular events achieved mean a sensitivity 65%, a mean specificity 55%, and a mean area under the curve of 0.71. The results show that cost-sensitive models combined with over/under sampling approaches are effective for the meaningful prediction of cardiovascular events in CHF patients.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"113 1","pages":"0"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information (Switzerland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info14100542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Chronic heart failure (CHF) is a clinical syndrome characterised by symptoms and signs due to structural and/or functional abnormalities of the heart. CHF confers risk for cardiovascular deterioration events which cause recurrent hospitalisations and high mortality rates. The early prediction of these events is very important to limit serious consequences, improve the quality of care, and reduce its burden. CHF is a progressive condition in which patients may remain asymptomatic before the onset of symptoms, as observed in heart failure with a preserved ejection fraction. The early detection of underlying causes is critical for treatment optimisation and prognosis improvement. To develop models to predict cardiovascular deterioration events in patients with chronic heart failure, a real dataset was constructed and a knowledge discovery task was implemented in this study. The dataset is imbalanced, as it is common in real-world applications. It thus posed a challenge because imbalanced datasets tend to be overwhelmed by the abundance of majority-class instances during the learning process. To address the issue, a pipeline was developed specifically to handle imbalanced data. Different predictive models were developed and compared. To enhance sensitivity and other performance metrics, we employed multiple approaches, including data resampling, cost-sensitive methods, and a hybrid method that combines both techniques. These methods were utilised to assess the predictive capabilities of the models and their effectiveness in handling imbalanced data. By using these metrics, we aimed to identify the most effective strategies for achieving improved model performance in real scenarios with imbalanced datasets. The best model for predicting cardiovascular events achieved mean a sensitivity 65%, a mean specificity 55%, and a mean area under the curve of 0.71. The results show that cost-sensitive models combined with over/under sampling approaches are effective for the meaningful prediction of cardiovascular events in CHF patients.