不平衡数据证据分类的多重自适应过度采样

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2024-05-11 DOI:10.1016/j.engappai.2024.108532

Zhen Zhang, Hong-peng Tian, Jin-shuai Jin

{"title":"不平衡数据证据分类的多重自适应过度采样","authors":"Zhen Zhang, Hong-peng Tian, Jin-shuai Jin","doi":"10.1016/j.engappai.2024.108532","DOIUrl":null,"url":null,"abstract":"<div><p>Over-sampling approaches focus on generating samples to balance the dataset and have been widely applied in classifying imbalanced data. However, existing approaches do not take into account the uncertainty of generated samples, which may alter the data distribution and introduce uncertain information into the classification process. To tackle this issue, we propose a multiple adaptive over-sampling approach (MAO) for classifying imbalanced data based on evidence reasoning. First, we construct balanced training sets through multiple adaptive over-sampling for the minority class, which characterizes the uncertainty of over-sampling. Then, we define the intra- and inter-class inconsistency of data distribution after over-sampling to quantify the weights of different classifiers trained by various balanced subsets, weakening the negative impact of changes in data distribution on classification. Finally, we employ neighbor information to revise the results of samples that are hard to classify correctly, to avoid the risk of misclassification caused by uncertain synthetic samples to some extent. The effectiveness of MAO has been verified on several real imbalanced datasets by comparing it with other related approaches.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"133 ","pages":"Article 108532"},"PeriodicalIF":7.5000,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiple adaptive over-sampling for imbalanced data evidential classification\",\"authors\":\"Zhen Zhang, Hong-peng Tian, Jin-shuai Jin\",\"doi\":\"10.1016/j.engappai.2024.108532\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Over-sampling approaches focus on generating samples to balance the dataset and have been widely applied in classifying imbalanced data. However, existing approaches do not take into account the uncertainty of generated samples, which may alter the data distribution and introduce uncertain information into the classification process. To tackle this issue, we propose a multiple adaptive over-sampling approach (MAO) for classifying imbalanced data based on evidence reasoning. First, we construct balanced training sets through multiple adaptive over-sampling for the minority class, which characterizes the uncertainty of over-sampling. Then, we define the intra- and inter-class inconsistency of data distribution after over-sampling to quantify the weights of different classifiers trained by various balanced subsets, weakening the negative impact of changes in data distribution on classification. Finally, we employ neighbor information to revise the results of samples that are hard to classify correctly, to avoid the risk of misclassification caused by uncertain synthetic samples to some extent. The effectiveness of MAO has been verified on several real imbalanced datasets by comparing it with other related approaches.</p></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"133 \",\"pages\":\"Article 108532\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197624006900\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197624006900","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

过度取样方法侧重于生成样本以平衡数据集，已被广泛应用于不平衡数据的分类。然而，现有方法没有考虑到生成样本的不确定性，这可能会改变数据分布，并在分类过程中引入不确定信息。针对这一问题，我们提出了一种基于证据推理的多重自适应过度采样方法（MAO），用于对不平衡数据进行分类。首先，我们通过对少数类的多重自适应过度采样来构建平衡的训练集，从而描述过度采样的不确定性。然后，我们定义了过度采样后数据分布的类内和类间不一致性，以量化不同平衡子集训练出的不同分类器的权重，削弱数据分布变化对分类的负面影响。最后，我们利用邻域信息对难以正确分类的样本结果进行修正，在一定程度上避免了不确定合成样本带来的误分类风险。通过与其他相关方法的比较，MAO 的有效性已经在几个真实的不平衡数据集上得到了验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multiple adaptive over-sampling for imbalanced data evidential classification

Over-sampling approaches focus on generating samples to balance the dataset and have been widely applied in classifying imbalanced data. However, existing approaches do not take into account the uncertainty of generated samples, which may alter the data distribution and introduce uncertain information into the classification process. To tackle this issue, we propose a multiple adaptive over-sampling approach (MAO) for classifying imbalanced data based on evidence reasoning. First, we construct balanced training sets through multiple adaptive over-sampling for the minority class, which characterizes the uncertainty of over-sampling. Then, we define the intra- and inter-class inconsistency of data distribution after over-sampling to quantify the weights of different classifiers trained by various balanced subsets, weakening the negative impact of changes in data distribution on classification. Finally, we employ neighbor information to revise the results of samples that are hard to classify correctly, to avoid the risk of misclassification caused by uncertain synthetic samples to some extent. The effectiveness of MAO has been verified on several real imbalanced datasets by comparing it with other related approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.