不平衡类大小的缺席数据生成分类器

J. Mach. Learn. Res. Pub Date : 2015-01-01 DOI:10.5555/2789272.2912085

Arash Pourhabib, B. Mallick, Yu Ding

{"title":"不平衡类大小的缺席数据生成分类器","authors":"Arash Pourhabib, B. Mallick, Yu Ding","doi":"10.5555/2789272.2912085","DOIUrl":null,"url":null,"abstract":"We propose an algorithm for two-class classification problems when the training data are imbalanced. This means the number of training instances in one of the classes is so low that the conventional classification algorithms become ineffective in detecting the minority class. We present a modification of the kernel Fisher discriminant analysis such that the imbalanced nature of the problem is explicitly addressed in the new algorithm formulation. The new algorithm exploits the properties of the existing minority points to learn the effects of other minority data points, had they actually existed. The algorithm proceeds iteratively by employing the learned properties and conditional sampling in such a way that it generates sufficient artificial data points for the minority set, thus enhancing the detection probability of the minority class. Implementing the proposed method on a number of simulated and real data sets, we show that our proposed method performs competitively compared to a set of alternative state-of-the-art imbalanced classification algorithms.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"100 1","pages":"2695-2724"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Absent data generating classifier for imbalanced class sizes\",\"authors\":\"Arash Pourhabib, B. Mallick, Yu Ding\",\"doi\":\"10.5555/2789272.2912085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an algorithm for two-class classification problems when the training data are imbalanced. This means the number of training instances in one of the classes is so low that the conventional classification algorithms become ineffective in detecting the minority class. We present a modification of the kernel Fisher discriminant analysis such that the imbalanced nature of the problem is explicitly addressed in the new algorithm formulation. The new algorithm exploits the properties of the existing minority points to learn the effects of other minority data points, had they actually existed. The algorithm proceeds iteratively by employing the learned properties and conditional sampling in such a way that it generates sufficient artificial data points for the minority set, thus enhancing the detection probability of the minority class. Implementing the proposed method on a number of simulated and real data sets, we show that our proposed method performs competitively compared to a set of alternative state-of-the-art imbalanced classification algorithms.\",\"PeriodicalId\":14794,\"journal\":{\"name\":\"J. Mach. Learn. Res.\",\"volume\":\"100 1\",\"pages\":\"2695-2724\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Mach. Learn. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5555/2789272.2912085\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2789272.2912085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

针对训练数据不平衡时的两类分类问题，提出了一种算法。这意味着一个类的训练实例数量非常少，以至于传统的分类算法在检测少数类时变得无效。我们提出了一个核费雪判别分析的修改，使问题的不平衡性质在新的算法公式中得到明确的解决。新算法利用现有少数数据点的特性来学习其他少数数据点在实际存在时的效果。该算法利用学习到的属性和条件采样进行迭代，为少数派集生成足够的人工数据点，从而提高了少数派类的检测概率。在许多模拟和真实数据集上实现所提出的方法，我们表明，与一组替代的最先进的不平衡分类算法相比，我们所提出的方法具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Absent data generating classifier for imbalanced class sizes

We propose an algorithm for two-class classification problems when the training data are imbalanced. This means the number of training instances in one of the classes is so low that the conventional classification algorithms become ineffective in detecting the minority class. We present a modification of the kernel Fisher discriminant analysis such that the imbalanced nature of the problem is explicitly addressed in the new algorithm formulation. The new algorithm exploits the properties of the existing minority points to learn the effects of other minority data points, had they actually existed. The algorithm proceeds iteratively by employing the learned properties and conditional sampling in such a way that it generates sufficient artificial data points for the minority set, thus enhancing the detection probability of the minority class. Implementing the proposed method on a number of simulated and real data sets, we show that our proposed method performs competitively compared to a set of alternative state-of-the-art imbalanced classification algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Mach. Learn. Res.

自引率

0.00%

发文量