{"title":"基于分层异构蚁群优化的超采样算法,利用特征相似性对不平衡数据进行分类","authors":"","doi":"10.1016/j.asoc.2024.112186","DOIUrl":null,"url":null,"abstract":"<div><p>Imbalanced data classification is one of the challenging problems in machine learning. Oversampling is a promising technique that generates synthetic minority instances to balance the dataset. Inappropriate minority instances generated may deteriorate the performance of the classifier. Majority of the oversampling algorithms create new minority instances by choosing nearest neighbors for random interpolation. However, these methods do not provide new information to the dataset and therefore standard classifiers do not show good performance on such datasets. Therefore, it is necessary to generate diverse minority class instances to increase the performance of the classifier. Since, every feature of each minority class instance contribute valuable information, generating synthetic instances from the features of all minority instances would produce diverse minority instances, thereby increasing the performance of the classifier. This paper proposes a Hierarchical Heterogeneous Ant Colony Optimization based oversampling algorithm using Feature Similarity (HHACO-FSOTe) for generation of synthetic minority instances. Instead of choosing few neighbors for interpolation, the proposal considers all minority instances for generation of synthetic instances. HHACO-FSOTe generates new feature values by computing the minimum absolute difference between the features of a given minority instance and the corresponding features of the remaining minority instances. The features in the dataset are distributed among the ant agents enabling parallelism, thereby reducing the time taken for oversampling. HHACO-FSOTe do not require parameter tuning or training. The proposal is evaluated on 41 low dimensional, 11 high dimensional and 8 noisy datasets. Experiments reveal that HHACO-FSOTe is competent with the state-of-art oversampling techniques. Results were validated using non-parametric statistical tests.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1568494624009608/pdfft?md5=f8d7bfd3f1457735583e32f5f01a2194&pid=1-s2.0-S1568494624009608-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A hierarchical heterogeneous ant colony optimization based oversampling algorithm using feature similarity for classification of imbalanced data\",\"authors\":\"\",\"doi\":\"10.1016/j.asoc.2024.112186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Imbalanced data classification is one of the challenging problems in machine learning. Oversampling is a promising technique that generates synthetic minority instances to balance the dataset. Inappropriate minority instances generated may deteriorate the performance of the classifier. Majority of the oversampling algorithms create new minority instances by choosing nearest neighbors for random interpolation. However, these methods do not provide new information to the dataset and therefore standard classifiers do not show good performance on such datasets. Therefore, it is necessary to generate diverse minority class instances to increase the performance of the classifier. Since, every feature of each minority class instance contribute valuable information, generating synthetic instances from the features of all minority instances would produce diverse minority instances, thereby increasing the performance of the classifier. This paper proposes a Hierarchical Heterogeneous Ant Colony Optimization based oversampling algorithm using Feature Similarity (HHACO-FSOTe) for generation of synthetic minority instances. Instead of choosing few neighbors for interpolation, the proposal considers all minority instances for generation of synthetic instances. HHACO-FSOTe generates new feature values by computing the minimum absolute difference between the features of a given minority instance and the corresponding features of the remaining minority instances. The features in the dataset are distributed among the ant agents enabling parallelism, thereby reducing the time taken for oversampling. HHACO-FSOTe do not require parameter tuning or training. The proposal is evaluated on 41 low dimensional, 11 high dimensional and 8 noisy datasets. Experiments reveal that HHACO-FSOTe is competent with the state-of-art oversampling techniques. Results were validated using non-parametric statistical tests.</p></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1568494624009608/pdfft?md5=f8d7bfd3f1457735583e32f5f01a2194&pid=1-s2.0-S1568494624009608-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494624009608\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624009608","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A hierarchical heterogeneous ant colony optimization based oversampling algorithm using feature similarity for classification of imbalanced data
Imbalanced data classification is one of the challenging problems in machine learning. Oversampling is a promising technique that generates synthetic minority instances to balance the dataset. Inappropriate minority instances generated may deteriorate the performance of the classifier. Majority of the oversampling algorithms create new minority instances by choosing nearest neighbors for random interpolation. However, these methods do not provide new information to the dataset and therefore standard classifiers do not show good performance on such datasets. Therefore, it is necessary to generate diverse minority class instances to increase the performance of the classifier. Since, every feature of each minority class instance contribute valuable information, generating synthetic instances from the features of all minority instances would produce diverse minority instances, thereby increasing the performance of the classifier. This paper proposes a Hierarchical Heterogeneous Ant Colony Optimization based oversampling algorithm using Feature Similarity (HHACO-FSOTe) for generation of synthetic minority instances. Instead of choosing few neighbors for interpolation, the proposal considers all minority instances for generation of synthetic instances. HHACO-FSOTe generates new feature values by computing the minimum absolute difference between the features of a given minority instance and the corresponding features of the remaining minority instances. The features in the dataset are distributed among the ant agents enabling parallelism, thereby reducing the time taken for oversampling. HHACO-FSOTe do not require parameter tuning or training. The proposal is evaluated on 41 low dimensional, 11 high dimensional and 8 noisy datasets. Experiments reveal that HHACO-FSOTe is competent with the state-of-art oversampling techniques. Results were validated using non-parametric statistical tests.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.