{"title":"Hybridization of DEBOHID with ENN algorithm for highly imbalanced datasets","authors":"Sedat Korkmaz","doi":"10.1016/j.jestch.2025.101976","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning algorithms assume that datasets are balanced, but most of the datasets in the real world are imbalanced. Class imbalance is a major challenge in machine learning and data mining. Oversampling and undersampling methods are commonly used to address this issue. Edited Nearest Neighbor (ENN) and Synthetic Minority Oversampling Technique (SMOTE) are essential methods for undersampling and oversampling, respectively. DEBOHID is a recently proposed differential evolution-based oversampling approach for highly imbalanced datasets. In this work, DEBOHID and ENN methods are combined to present a novel hybrid method called D-ENN. The performance of D-ENN was evaluated using 44 highly imbalanced datasets. A parameter analysis was conducted on D-ENN to determine the optimal values for the F, CR and D-ENN-Type parameters. Three classifiers were used in the study: Support Vector Machines (SVM), Decision Tree (DT), and K-nearest Neighbor (kNN), and reported their G-mean and Area Under Curve (AUC) values. Upon evaluation of the average Winner, Mean Rank and Final Rank values obtained for each classifier and metric pair, the proposed D-ENN method demonstrated superior performance compared to nine state-of-the-art sampling methods, with an average Winner value of 13, an average Mean Rank value of 3.40 and an average Final Rank value of 1.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"63 ","pages":"Article 101976"},"PeriodicalIF":5.1000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221509862500031X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning algorithms assume that datasets are balanced, but most of the datasets in the real world are imbalanced. Class imbalance is a major challenge in machine learning and data mining. Oversampling and undersampling methods are commonly used to address this issue. Edited Nearest Neighbor (ENN) and Synthetic Minority Oversampling Technique (SMOTE) are essential methods for undersampling and oversampling, respectively. DEBOHID is a recently proposed differential evolution-based oversampling approach for highly imbalanced datasets. In this work, DEBOHID and ENN methods are combined to present a novel hybrid method called D-ENN. The performance of D-ENN was evaluated using 44 highly imbalanced datasets. A parameter analysis was conducted on D-ENN to determine the optimal values for the F, CR and D-ENN-Type parameters. Three classifiers were used in the study: Support Vector Machines (SVM), Decision Tree (DT), and K-nearest Neighbor (kNN), and reported their G-mean and Area Under Curve (AUC) values. Upon evaluation of the average Winner, Mean Rank and Final Rank values obtained for each classifier and metric pair, the proposed D-ENN method demonstrated superior performance compared to nine state-of-the-art sampling methods, with an average Winner value of 13, an average Mean Rank value of 3.40 and an average Final Rank value of 1.
期刊介绍:
Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology.
The scope of JESTECH includes a wide spectrum of subjects including:
-Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing)
-Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences)
-Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)