A hierarchical heterogeneous ant colony optimization based oversampling algorithm using feature similarity for classification of imbalanced data

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2024-09-04 DOI:10.1016/j.asoc.2024.112186

{"title":"A hierarchical heterogeneous ant colony optimization based oversampling algorithm using feature similarity for classification of imbalanced data","authors":"","doi":"10.1016/j.asoc.2024.112186","DOIUrl":null,"url":null,"abstract":"<div><p>Imbalanced data classification is one of the challenging problems in machine learning. Oversampling is a promising technique that generates synthetic minority instances to balance the dataset. Inappropriate minority instances generated may deteriorate the performance of the classifier. Majority of the oversampling algorithms create new minority instances by choosing nearest neighbors for random interpolation. However, these methods do not provide new information to the dataset and therefore standard classifiers do not show good performance on such datasets. Therefore, it is necessary to generate diverse minority class instances to increase the performance of the classifier. Since, every feature of each minority class instance contribute valuable information, generating synthetic instances from the features of all minority instances would produce diverse minority instances, thereby increasing the performance of the classifier. This paper proposes a Hierarchical Heterogeneous Ant Colony Optimization based oversampling algorithm using Feature Similarity (HHACO-FSOTe) for generation of synthetic minority instances. Instead of choosing few neighbors for interpolation, the proposal considers all minority instances for generation of synthetic instances. HHACO-FSOTe generates new feature values by computing the minimum absolute difference between the features of a given minority instance and the corresponding features of the remaining minority instances. The features in the dataset are distributed among the ant agents enabling parallelism, thereby reducing the time taken for oversampling. HHACO-FSOTe do not require parameter tuning or training. The proposal is evaluated on 41 low dimensional, 11 high dimensional and 8 noisy datasets. Experiments reveal that HHACO-FSOTe is competent with the state-of-art oversampling techniques. Results were validated using non-parametric statistical tests.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1568494624009608/pdfft?md5=f8d7bfd3f1457735583e32f5f01a2194&pid=1-s2.0-S1568494624009608-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624009608","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Imbalanced data classification is one of the challenging problems in machine learning. Oversampling is a promising technique that generates synthetic minority instances to balance the dataset. Inappropriate minority instances generated may deteriorate the performance of the classifier. Majority of the oversampling algorithms create new minority instances by choosing nearest neighbors for random interpolation. However, these methods do not provide new information to the dataset and therefore standard classifiers do not show good performance on such datasets. Therefore, it is necessary to generate diverse minority class instances to increase the performance of the classifier. Since, every feature of each minority class instance contribute valuable information, generating synthetic instances from the features of all minority instances would produce diverse minority instances, thereby increasing the performance of the classifier. This paper proposes a Hierarchical Heterogeneous Ant Colony Optimization based oversampling algorithm using Feature Similarity (HHACO-FSOTe) for generation of synthetic minority instances. Instead of choosing few neighbors for interpolation, the proposal considers all minority instances for generation of synthetic instances. HHACO-FSOTe generates new feature values by computing the minimum absolute difference between the features of a given minority instance and the corresponding features of the remaining minority instances. The features in the dataset are distributed among the ant agents enabling parallelism, thereby reducing the time taken for oversampling. HHACO-FSOTe do not require parameter tuning or training. The proposal is evaluated on 41 low dimensional, 11 high dimensional and 8 noisy datasets. Experiments reveal that HHACO-FSOTe is competent with the state-of-art oversampling techniques. Results were validated using non-parametric statistical tests.

查看原文本刊更多论文

基于分层异构蚁群优化的超采样算法，利用特征相似性对不平衡数据进行分类

不平衡数据分类是机器学习中极具挑战性的问题之一。过度采样是一种很有前途的技术，它可以生成合成的少数实例来平衡数据集。生成不合适的少数实例可能会降低分类器的性能。大多数过采样算法都是通过选择近邻进行随机插值来创建新的少数实例。然而，这些方法并不能为数据集提供新的信息，因此标准分类器在此类数据集上并不能显示出良好的性能。因此，有必要生成多样化的少数类实例来提高分类器的性能。由于每个少数群体实例的每个特征都贡献了有价值的信息，因此根据所有少数群体实例的特征生成合成实例将产生多样化的少数群体实例，从而提高分类器的性能。本文提出了一种基于分层异构蚁群优化（Hierarchical Heterogeneous Antony Optimization）的特征相似性超采样算法（HHACO-FSOTe），用于生成合成的少数群体实例。该建议在生成合成实例时不选择少数邻居进行插值，而是考虑所有少数实例。HHACO-FSOTe 通过计算给定少数实例的特征与其余少数实例的相应特征之间的最小绝对差值来生成新的特征值。数据集中的特征分布在蚂蚁代理之间，实现了并行性，从而减少了超采样所需的时间。HHACO-FSOTe 不需要参数调整或训练。该建议在 41 个低维、11 个高维和 8 个噪声数据集上进行了评估。实验表明，HHACO-FSOTe 能胜任最先进的超采样技术。结果通过非参数统计检验进行了验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.