保留少数类的原型选择与生成

2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA) Pub Date : 2021-07-12 DOI:10.1109/IISA52424.2021.9555514

Konstantinos Xouveroudis, Stefanos Ougiaroglou, Georgios Evangelidis, D. Dervos

{"title":"保留少数类的原型选择与生成","authors":"Konstantinos Xouveroudis, Stefanos Ougiaroglou, Georgios Evangelidis, D. Dervos","doi":"10.1109/IISA52424.2021.9555514","DOIUrl":null,"url":null,"abstract":"Instance-based classifiers become inefficient when the size of their training dataset or model is large. Therefore, they are usually applied in conjunction with a Data Reduction Technique that collects prototypes from the available training data. The set of prototypes is called the condensing set and has the benefit of low computational cost during classification, while, at the same time, accuracy is not negatively affected. In case of imbalanced training data, the number of prototypes collected for the minority (rare) classes may be insufficient. Even worse, the rare classes may be eliminated. This paper presents three methods that preserve the rare classes when data reduction is applied. Two of the methods apply data reduction only on the instances that belong to common classes and avoid costly under-sampling or over-sampling procedures that deal with class imbalances. The third method utilizes SMOTE over-sampling before data reduction. The three methods were tested by conducting experiments on twelve imbalanced datasets. Experimental results reveal high recall and very good reduction rates.","PeriodicalId":437496,"journal":{"name":"2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prototype Selection and Generation with Minority Classes Preservation\",\"authors\":\"Konstantinos Xouveroudis, Stefanos Ougiaroglou, Georgios Evangelidis, D. Dervos\",\"doi\":\"10.1109/IISA52424.2021.9555514\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Instance-based classifiers become inefficient when the size of their training dataset or model is large. Therefore, they are usually applied in conjunction with a Data Reduction Technique that collects prototypes from the available training data. The set of prototypes is called the condensing set and has the benefit of low computational cost during classification, while, at the same time, accuracy is not negatively affected. In case of imbalanced training data, the number of prototypes collected for the minority (rare) classes may be insufficient. Even worse, the rare classes may be eliminated. This paper presents three methods that preserve the rare classes when data reduction is applied. Two of the methods apply data reduction only on the instances that belong to common classes and avoid costly under-sampling or over-sampling procedures that deal with class imbalances. The third method utilizes SMOTE over-sampling before data reduction. The three methods were tested by conducting experiments on twelve imbalanced datasets. Experimental results reveal high recall and very good reduction rates.\",\"PeriodicalId\":437496,\"journal\":{\"name\":\"2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISA52424.2021.9555514\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA52424.2021.9555514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

当训练数据集或模型的规模很大时，基于实例的分类器会变得效率低下。因此，它们通常与从可用训练数据中收集原型的数据缩减技术一起应用。该原型集称为压缩集，其优点是分类时计算成本低，同时不影响准确率。在训练数据不平衡的情况下，为少数(罕见)类收集的原型数量可能不足。更糟糕的是，稀有的职业可能会被淘汰。本文提出了在应用数据约简时保留稀有类的三种方法。其中两种方法仅对属于常见类的实例应用数据缩减，避免了处理类不平衡的代价高昂的欠采样或过采样过程。第三种方法在数据缩减之前利用SMOTE过采样。在12个不平衡数据集上对这三种方法进行了实验验证。实验结果表明，该方法具有较高的召回率和较好的还原率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Prototype Selection and Generation with Minority Classes Preservation

Instance-based classifiers become inefficient when the size of their training dataset or model is large. Therefore, they are usually applied in conjunction with a Data Reduction Technique that collects prototypes from the available training data. The set of prototypes is called the condensing set and has the benefit of low computational cost during classification, while, at the same time, accuracy is not negatively affected. In case of imbalanced training data, the number of prototypes collected for the minority (rare) classes may be insufficient. Even worse, the rare classes may be eliminated. This paper presents three methods that preserve the rare classes when data reduction is applied. Two of the methods apply data reduction only on the instances that belong to common classes and avoid costly under-sampling or over-sampling procedures that deal with class imbalances. The third method utilizes SMOTE over-sampling before data reduction. The three methods were tested by conducting experiments on twelve imbalanced datasets. Experimental results reveal high recall and very good reduction rates.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA)

自引率

0.00%

发文量