Mauricio Rodríguez Segura, O. Nicolis, Billy Peralta Márquez, Juan Carrillo Azócar
{"title":"结合最优特征选择方法与机器学习预测心血管疾病","authors":"Mauricio Rodríguez Segura, O. Nicolis, Billy Peralta Márquez, Juan Carrillo Azócar","doi":"10.1109/SCCC51225.2020.9281168","DOIUrl":null,"url":null,"abstract":"Cardiovascular Disease (CVD) is one of the main causes of death in the world. Early detection could prevent deaths associated to cardiac problems. In this work, we propose a methodology based on data pre-processing and Machine Learning (ML) techniques for predicting cardiovascular disease, by using the Sleep Heart Health Study (SHHS) dataset. First, the principal component analysis and lowest p-value logistic regression are applied to select optimal features which could be related to the CVD. Then, the selected features are used for training four ML algorithms: Naïve Bayes (NB), Feed Forward Neural Networks (NN), Support Vector Machine (SVM) and Random Forest (RF). A binary feature was considered as output of the proposed models and the SMOTE sampling has been used for balancing the training set. Among the proposed methods, NN provided the best accuracy (0.81) and AUC (0.76) outperforming the results obtained in other studies.","PeriodicalId":117157,"journal":{"name":"2020 39th International Conference of the Chilean Computer Science Society (SCCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Predicting cardiovascular disease by combining optimal feature selection methods with machine learning\",\"authors\":\"Mauricio Rodríguez Segura, O. Nicolis, Billy Peralta Márquez, Juan Carrillo Azócar\",\"doi\":\"10.1109/SCCC51225.2020.9281168\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cardiovascular Disease (CVD) is one of the main causes of death in the world. Early detection could prevent deaths associated to cardiac problems. In this work, we propose a methodology based on data pre-processing and Machine Learning (ML) techniques for predicting cardiovascular disease, by using the Sleep Heart Health Study (SHHS) dataset. First, the principal component analysis and lowest p-value logistic regression are applied to select optimal features which could be related to the CVD. Then, the selected features are used for training four ML algorithms: Naïve Bayes (NB), Feed Forward Neural Networks (NN), Support Vector Machine (SVM) and Random Forest (RF). A binary feature was considered as output of the proposed models and the SMOTE sampling has been used for balancing the training set. Among the proposed methods, NN provided the best accuracy (0.81) and AUC (0.76) outperforming the results obtained in other studies.\",\"PeriodicalId\":117157,\"journal\":{\"name\":\"2020 39th International Conference of the Chilean Computer Science Society (SCCC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 39th International Conference of the Chilean Computer Science Society (SCCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCCC51225.2020.9281168\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 39th International Conference of the Chilean Computer Science Society (SCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCCC51225.2020.9281168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Predicting cardiovascular disease by combining optimal feature selection methods with machine learning
Cardiovascular Disease (CVD) is one of the main causes of death in the world. Early detection could prevent deaths associated to cardiac problems. In this work, we propose a methodology based on data pre-processing and Machine Learning (ML) techniques for predicting cardiovascular disease, by using the Sleep Heart Health Study (SHHS) dataset. First, the principal component analysis and lowest p-value logistic regression are applied to select optimal features which could be related to the CVD. Then, the selected features are used for training four ML algorithms: Naïve Bayes (NB), Feed Forward Neural Networks (NN), Support Vector Machine (SVM) and Random Forest (RF). A binary feature was considered as output of the proposed models and the SMOTE sampling has been used for balancing the training set. Among the proposed methods, NN provided the best accuracy (0.81) and AUC (0.76) outperforming the results obtained in other studies.