{"title":"GDG: An evolutionary oversampling framework integrating Gaussian mixture modeling and genetic algorithm","authors":"Yelin Zhang, Dongmei Wang, Yuehua Yu, Chen Chen, Chengwang Xie","doi":"10.1016/j.swevo.2026.102375","DOIUrl":null,"url":null,"abstract":"<div><div>Class imbalance induces significant bias in machine learning classifiers. While oversampling mitigates this, a critical knowledge gap exists: conventional generative methods often assume unimodal distributions and fail to address complex boundary overlap, leading to noisy, low-fidelity synthetic samples. To bridge this gap, we propose GDG, a novel framework integrating Gaussian Mixture Model (GMM) and Genetic Algorithm (GA). First, GMM clusters minority samples to accurately capture intrinsic multi-modal structures. Subsequently, an innovative global–local mechanism adaptively allocates synthetic samples based on boundary complexity, effectively minimizing overlap. Lastly, the GA performs a nonlinear search within superspheres, utilizing adaptive fitness weights to balance exploration and exploitation for high-quality generation. Extensive experiments on 21 benchmark datasets demonstrate that GDG significantly outperforms nine state-of-the-art baselines, improving average Accuracy by 1.9%, G-mean by 6.0%, and AUC by 1.2%. Rigorous non-parametric statistical analysis confirms these differences (<span><math><mrow><mi>p</mi><mo>=</mo><mn>1</mn><mo>.</mo><mn>78</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>7</mn></mrow></msup></mrow></math></span>), with post-hoc Nemenyi testing verifying that GDG achieves the superior average rank of 2.17. These findings establish GDG as a robust, statistically validated solution for tackling complex class imbalance problems.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"104 ","pages":"Article 102375"},"PeriodicalIF":8.5000,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650226000957","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/4/6 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Class imbalance induces significant bias in machine learning classifiers. While oversampling mitigates this, a critical knowledge gap exists: conventional generative methods often assume unimodal distributions and fail to address complex boundary overlap, leading to noisy, low-fidelity synthetic samples. To bridge this gap, we propose GDG, a novel framework integrating Gaussian Mixture Model (GMM) and Genetic Algorithm (GA). First, GMM clusters minority samples to accurately capture intrinsic multi-modal structures. Subsequently, an innovative global–local mechanism adaptively allocates synthetic samples based on boundary complexity, effectively minimizing overlap. Lastly, the GA performs a nonlinear search within superspheres, utilizing adaptive fitness weights to balance exploration and exploitation for high-quality generation. Extensive experiments on 21 benchmark datasets demonstrate that GDG significantly outperforms nine state-of-the-art baselines, improving average Accuracy by 1.9%, G-mean by 6.0%, and AUC by 1.2%. Rigorous non-parametric statistical analysis confirms these differences (), with post-hoc Nemenyi testing verifying that GDG achieves the superior average rank of 2.17. These findings establish GDG as a robust, statistically validated solution for tackling complex class imbalance problems.
期刊介绍:
Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.