HHO-SMOTe:基于Harris Hawk优化的合成少数派过采样技术的有效采样率

IF 0.7 Q3 COMPUTER SCIENCE, THEORY & METHODS
Khaled SH. Raslan, Almohammady S. Alsharkawy, K. R. Raslan
{"title":"HHO-SMOTe:基于Harris Hawk优化的合成少数派过采样技术的有效采样率","authors":"Khaled SH. Raslan, Almohammady S. Alsharkawy, K. R. Raslan","doi":"10.14569/ijacsa.2023.0141047","DOIUrl":null,"url":null,"abstract":"Classifying imbalanced datasets presents a significant challenge in the field of machine learning, especially with big data, where instances are unevenly distributed among classes, leading to class imbalance issues that affect classifier performance. Synthetic Minority Over-sampling Technique (SMOTE) is an effective oversampling method that addresses this by generating new instances for the under-represented minority class. However, SMOTE's efficiency relies on the sampling rate for minority class instances, making optimal sampling rates crucial for solving class imbalance. In this paper, we introduce HHO-SMOTe, a novel hybrid approach that combines the Harris Hawk optimization (HHO) search algorithm with SMOTE to enhance classification accuracy by determining optimal sample rates for each dataset. We conducted extensive experiments across diverse datasets to comprehensively evaluate our binary classification model. The results demonstrated our model's exceptional performance, with an AUC score exceeding 0.96, a high G-means score of 0.95 highlighting its robustness, and an outstanding F1-score consistently exceeding 0.99. These findings collectively establish our proposed approach as a formidable contender in the domain of binary classification models.","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":"71 1","pages":"0"},"PeriodicalIF":0.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HHO-SMOTe: Efficient Sampling Rate for Synthetic Minority Oversampling Technique Based on Harris Hawk Optimization\",\"authors\":\"Khaled SH. Raslan, Almohammady S. Alsharkawy, K. R. Raslan\",\"doi\":\"10.14569/ijacsa.2023.0141047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classifying imbalanced datasets presents a significant challenge in the field of machine learning, especially with big data, where instances are unevenly distributed among classes, leading to class imbalance issues that affect classifier performance. Synthetic Minority Over-sampling Technique (SMOTE) is an effective oversampling method that addresses this by generating new instances for the under-represented minority class. However, SMOTE's efficiency relies on the sampling rate for minority class instances, making optimal sampling rates crucial for solving class imbalance. In this paper, we introduce HHO-SMOTe, a novel hybrid approach that combines the Harris Hawk optimization (HHO) search algorithm with SMOTE to enhance classification accuracy by determining optimal sample rates for each dataset. We conducted extensive experiments across diverse datasets to comprehensively evaluate our binary classification model. The results demonstrated our model's exceptional performance, with an AUC score exceeding 0.96, a high G-means score of 0.95 highlighting its robustness, and an outstanding F1-score consistently exceeding 0.99. These findings collectively establish our proposed approach as a formidable contender in the domain of binary classification models.\",\"PeriodicalId\":13824,\"journal\":{\"name\":\"International Journal of Advanced Computer Science and Applications\",\"volume\":\"71 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advanced Computer Science and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14569/ijacsa.2023.0141047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Computer Science and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14569/ijacsa.2023.0141047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

对不平衡数据集进行分类是机器学习领域的一个重大挑战,特别是在大数据领域,其中实例在类之间分布不均匀,导致类不平衡问题影响分类器性能。合成少数群体过采样技术(SMOTE)是一种有效的过采样方法,通过为代表性不足的少数群体生成新的实例来解决这个问题。然而,SMOTE的效率依赖于少数类实例的采样率,因此最优采样率对于解决类不平衡至关重要。在本文中,我们介绍了一种新的混合方法HHO- SMOTE,它将哈里斯鹰优化(HHO)搜索算法与SMOTE相结合,通过确定每个数据集的最佳样本率来提高分类精度。我们在不同的数据集上进行了广泛的实验,以全面评估我们的二元分类模型。结果表明,我们的模型具有优异的性能,AUC得分超过0.96,G-means得分高达0.95,突出了模型的稳健性,f1得分始终超过0.99。这些发现共同建立了我们提出的方法作为一个强大的竞争者在二元分类模型领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
HHO-SMOTe: Efficient Sampling Rate for Synthetic Minority Oversampling Technique Based on Harris Hawk Optimization
Classifying imbalanced datasets presents a significant challenge in the field of machine learning, especially with big data, where instances are unevenly distributed among classes, leading to class imbalance issues that affect classifier performance. Synthetic Minority Over-sampling Technique (SMOTE) is an effective oversampling method that addresses this by generating new instances for the under-represented minority class. However, SMOTE's efficiency relies on the sampling rate for minority class instances, making optimal sampling rates crucial for solving class imbalance. In this paper, we introduce HHO-SMOTe, a novel hybrid approach that combines the Harris Hawk optimization (HHO) search algorithm with SMOTE to enhance classification accuracy by determining optimal sample rates for each dataset. We conducted extensive experiments across diverse datasets to comprehensively evaluate our binary classification model. The results demonstrated our model's exceptional performance, with an AUC score exceeding 0.96, a high G-means score of 0.95 highlighting its robustness, and an outstanding F1-score consistently exceeding 0.99. These findings collectively establish our proposed approach as a formidable contender in the domain of binary classification models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.30
自引率
22.20%
发文量
519
期刊介绍: IJACSA is a scholarly computer science journal representing the best in research. Its mission is to provide an outlet for quality research to be publicised and published to a global audience. The journal aims to publish papers selected through rigorous double-blind peer review to ensure originality, timeliness, relevance, and readability. In sync with the Journal''s vision "to be a respected publication that publishes peer reviewed research articles, as well as review and survey papers contributed by International community of Authors", we have drawn reviewers and editors from Institutions and Universities across the globe. A double blind peer review process is conducted to ensure that we retain high standards. At IJACSA, we stand strong because we know that global challenges make way for new innovations, new ways and new talent. International Journal of Advanced Computer Science and Applications publishes carefully refereed research, review and survey papers which offer a significant contribution to the computer science literature, and which are of interest to a wide audience. Coverage extends to all main-stream branches of computer science and related applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信