{"title":"解决医疗保健数据中的类别失衡问题:针对年龄相关性黄斑变性和先兆子痫的机器学习解决方案","authors":"Antonieta Martinez-Velasco;Lourdes Martínez -Villaseñor;Luis Miralles-Pechuán","doi":"10.1109/TLA.2024.10705995","DOIUrl":null,"url":null,"abstract":"The use of machine learning in healthcare has transformed the way diseases are diagnosed and treatments are optimized. However, medical databases often lack balanced data due to challenges in data collection caused by privacy regulations. Certain health conditions are under represented, which hampers machine learning performance. To address this problem, a hybrid approach has been proposed that combines the Synthetic Minority Oversampling Technique (SMOTE) with under sampling and uses two specific techniques tailored for imbalanced datasets. Comparative evaluations were conducted using various thresholds to reduce one class and employingBalanced Accuracy to mitigate bias toward the majority class, with popular machine learning methods. The results showed that Balanced Bagging and Balanced Random Forest consistently outperformed other methods, performing the best with an average ranking of 1.42 and 3.58 out of 32 configurations in the two datasets, respectively. Tree-based approaches such as Random Forest and Gradient Boosting demonstrated similar effectiveness, emphasizing the power of aggregating predictions from multiple trees to reduce bias. Notably, under sampling andSMOTE proved advantageous for non-tree-based models likeKNN, SVM, and Logistic Regression showcasing their usefulness across different algorithms. This study provides a robust solution for handling imbalanced datasets in healthcare, which could potentially optimize healthcare interventions and improve patient outcomes and care.","PeriodicalId":55024,"journal":{"name":"IEEE Latin America Transactions","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10705995","citationCount":"0","resultStr":"{\"title\":\"Addressing Class Imbalance in Healthcare Data: Machine Learning Solutions for Age-Related Macular Degeneration and Preeclampsia\",\"authors\":\"Antonieta Martinez-Velasco;Lourdes Martínez -Villaseñor;Luis Miralles-Pechuán\",\"doi\":\"10.1109/TLA.2024.10705995\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of machine learning in healthcare has transformed the way diseases are diagnosed and treatments are optimized. However, medical databases often lack balanced data due to challenges in data collection caused by privacy regulations. Certain health conditions are under represented, which hampers machine learning performance. To address this problem, a hybrid approach has been proposed that combines the Synthetic Minority Oversampling Technique (SMOTE) with under sampling and uses two specific techniques tailored for imbalanced datasets. Comparative evaluations were conducted using various thresholds to reduce one class and employingBalanced Accuracy to mitigate bias toward the majority class, with popular machine learning methods. The results showed that Balanced Bagging and Balanced Random Forest consistently outperformed other methods, performing the best with an average ranking of 1.42 and 3.58 out of 32 configurations in the two datasets, respectively. Tree-based approaches such as Random Forest and Gradient Boosting demonstrated similar effectiveness, emphasizing the power of aggregating predictions from multiple trees to reduce bias. Notably, under sampling andSMOTE proved advantageous for non-tree-based models likeKNN, SVM, and Logistic Regression showcasing their usefulness across different algorithms. This study provides a robust solution for handling imbalanced datasets in healthcare, which could potentially optimize healthcare interventions and improve patient outcomes and care.\",\"PeriodicalId\":55024,\"journal\":{\"name\":\"IEEE Latin America Transactions\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10705995\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Latin America Transactions\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10705995/\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Latin America Transactions","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10705995/","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Addressing Class Imbalance in Healthcare Data: Machine Learning Solutions for Age-Related Macular Degeneration and Preeclampsia
The use of machine learning in healthcare has transformed the way diseases are diagnosed and treatments are optimized. However, medical databases often lack balanced data due to challenges in data collection caused by privacy regulations. Certain health conditions are under represented, which hampers machine learning performance. To address this problem, a hybrid approach has been proposed that combines the Synthetic Minority Oversampling Technique (SMOTE) with under sampling and uses two specific techniques tailored for imbalanced datasets. Comparative evaluations were conducted using various thresholds to reduce one class and employingBalanced Accuracy to mitigate bias toward the majority class, with popular machine learning methods. The results showed that Balanced Bagging and Balanced Random Forest consistently outperformed other methods, performing the best with an average ranking of 1.42 and 3.58 out of 32 configurations in the two datasets, respectively. Tree-based approaches such as Random Forest and Gradient Boosting demonstrated similar effectiveness, emphasizing the power of aggregating predictions from multiple trees to reduce bias. Notably, under sampling andSMOTE proved advantageous for non-tree-based models likeKNN, SVM, and Logistic Regression showcasing their usefulness across different algorithms. This study provides a robust solution for handling imbalanced datasets in healthcare, which could potentially optimize healthcare interventions and improve patient outcomes and care.
期刊介绍:
IEEE Latin America Transactions (IEEE LATAM) is an interdisciplinary journal focused on the dissemination of original and quality research papers / review articles in Spanish and Portuguese of emerging topics in three main areas: Computing, Electric Energy and Electronics. Some of the sub-areas of the journal are, but not limited to: Automatic control, communications, instrumentation, artificial intelligence, power and industrial electronics, fault diagnosis and detection, transportation electrification, internet of things, electrical machines, circuits and systems, biomedicine and biomedical / haptic applications, secure communications, robotics, sensors and actuators, computer networks, smart grids, among others.