Muhammed Abdulhamid Karabiyik, Bahaeddin Turkoglu, Tunc Asuroglu
{"title":"一种基于聚类辅助差分进化的不平衡数据混合过采样方法。","authors":"Muhammed Abdulhamid Karabiyik, Bahaeddin Turkoglu, Tunc Asuroglu","doi":"10.7717/peerj-cs.3177","DOIUrl":null,"url":null,"abstract":"<p><p>Class imbalance remains a significant challenge in machine learning, leading to biased models that favor the majority class while failing to accurately classify minority instances. Traditional oversampling methods, such as Synthetic Minority Over-sampling Technique (SMOTE) and its variants, often struggle with class overlap, poor decision boundary representation, and noise accumulation. To address these limitations, this study introduces ClusterDEBO, a novel hybrid oversampling method that integrates K-Means clustering with differential evolution (DE) to generate synthetic samples in a more structured and adaptive manner. The proposed method first partitions the minority class into clusters using the silhouette score to determine the optimal number of clusters. Within each cluster, DE-based mutation and crossover operations are applied to generate diverse and well-distributed synthetic samples while preserving the underlying data distribution. Additionally, a selective sampling and noise reduction mechanism is employed to filter out low-impact synthetic samples based on their contribution to classification performance. The effectiveness of ClusterDEBO is evaluated on 44 benchmark datasets using k-Nearest Neighbors (kNN), decision tree (DT), and support vector machines (SVM) as classifiers. The results demonstrate that ClusterDEBO consistently outperforms existing oversampling techniques, leading to improved class separability and enhanced classifier robustness. Moreover, statistical validation using the Friedman test confirms the significance of the improvements, ensuring that the observed gains are not due to random variations. The findings highlight the potential of cluster-assisted differential evolution as a powerful strategy for handling imbalanced datasets.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3177"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453762/pdf/","citationCount":"0","resultStr":"{\"title\":\"A cluster-assisted differential evolution-based hybrid oversampling method for imbalanced datasets.\",\"authors\":\"Muhammed Abdulhamid Karabiyik, Bahaeddin Turkoglu, Tunc Asuroglu\",\"doi\":\"10.7717/peerj-cs.3177\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Class imbalance remains a significant challenge in machine learning, leading to biased models that favor the majority class while failing to accurately classify minority instances. Traditional oversampling methods, such as Synthetic Minority Over-sampling Technique (SMOTE) and its variants, often struggle with class overlap, poor decision boundary representation, and noise accumulation. To address these limitations, this study introduces ClusterDEBO, a novel hybrid oversampling method that integrates K-Means clustering with differential evolution (DE) to generate synthetic samples in a more structured and adaptive manner. The proposed method first partitions the minority class into clusters using the silhouette score to determine the optimal number of clusters. Within each cluster, DE-based mutation and crossover operations are applied to generate diverse and well-distributed synthetic samples while preserving the underlying data distribution. Additionally, a selective sampling and noise reduction mechanism is employed to filter out low-impact synthetic samples based on their contribution to classification performance. The effectiveness of ClusterDEBO is evaluated on 44 benchmark datasets using k-Nearest Neighbors (kNN), decision tree (DT), and support vector machines (SVM) as classifiers. The results demonstrate that ClusterDEBO consistently outperforms existing oversampling techniques, leading to improved class separability and enhanced classifier robustness. Moreover, statistical validation using the Friedman test confirms the significance of the improvements, ensuring that the observed gains are not due to random variations. The findings highlight the potential of cluster-assisted differential evolution as a powerful strategy for handling imbalanced datasets.</p>\",\"PeriodicalId\":54224,\"journal\":{\"name\":\"PeerJ Computer Science\",\"volume\":\"11 \",\"pages\":\"e3177\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453762/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PeerJ Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.7717/peerj-cs.3177\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.3177","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A cluster-assisted differential evolution-based hybrid oversampling method for imbalanced datasets.
Class imbalance remains a significant challenge in machine learning, leading to biased models that favor the majority class while failing to accurately classify minority instances. Traditional oversampling methods, such as Synthetic Minority Over-sampling Technique (SMOTE) and its variants, often struggle with class overlap, poor decision boundary representation, and noise accumulation. To address these limitations, this study introduces ClusterDEBO, a novel hybrid oversampling method that integrates K-Means clustering with differential evolution (DE) to generate synthetic samples in a more structured and adaptive manner. The proposed method first partitions the minority class into clusters using the silhouette score to determine the optimal number of clusters. Within each cluster, DE-based mutation and crossover operations are applied to generate diverse and well-distributed synthetic samples while preserving the underlying data distribution. Additionally, a selective sampling and noise reduction mechanism is employed to filter out low-impact synthetic samples based on their contribution to classification performance. The effectiveness of ClusterDEBO is evaluated on 44 benchmark datasets using k-Nearest Neighbors (kNN), decision tree (DT), and support vector machines (SVM) as classifiers. The results demonstrate that ClusterDEBO consistently outperforms existing oversampling techniques, leading to improved class separability and enhanced classifier robustness. Moreover, statistical validation using the Friedman test confirms the significance of the improvements, ensuring that the observed gains are not due to random variations. The findings highlight the potential of cluster-assisted differential evolution as a powerful strategy for handling imbalanced datasets.
期刊介绍:
PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.