保留少数类的原型选择与生成

Konstantinos Xouveroudis, Stefanos Ougiaroglou, Georgios Evangelidis, D. Dervos
{"title":"保留少数类的原型选择与生成","authors":"Konstantinos Xouveroudis, Stefanos Ougiaroglou, Georgios Evangelidis, D. Dervos","doi":"10.1109/IISA52424.2021.9555514","DOIUrl":null,"url":null,"abstract":"Instance-based classifiers become inefficient when the size of their training dataset or model is large. Therefore, they are usually applied in conjunction with a Data Reduction Technique that collects prototypes from the available training data. The set of prototypes is called the condensing set and has the benefit of low computational cost during classification, while, at the same time, accuracy is not negatively affected. In case of imbalanced training data, the number of prototypes collected for the minority (rare) classes may be insufficient. Even worse, the rare classes may be eliminated. This paper presents three methods that preserve the rare classes when data reduction is applied. Two of the methods apply data reduction only on the instances that belong to common classes and avoid costly under-sampling or over-sampling procedures that deal with class imbalances. The third method utilizes SMOTE over-sampling before data reduction. The three methods were tested by conducting experiments on twelve imbalanced datasets. Experimental results reveal high recall and very good reduction rates.","PeriodicalId":437496,"journal":{"name":"2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prototype Selection and Generation with Minority Classes Preservation\",\"authors\":\"Konstantinos Xouveroudis, Stefanos Ougiaroglou, Georgios Evangelidis, D. Dervos\",\"doi\":\"10.1109/IISA52424.2021.9555514\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Instance-based classifiers become inefficient when the size of their training dataset or model is large. Therefore, they are usually applied in conjunction with a Data Reduction Technique that collects prototypes from the available training data. The set of prototypes is called the condensing set and has the benefit of low computational cost during classification, while, at the same time, accuracy is not negatively affected. In case of imbalanced training data, the number of prototypes collected for the minority (rare) classes may be insufficient. Even worse, the rare classes may be eliminated. This paper presents three methods that preserve the rare classes when data reduction is applied. Two of the methods apply data reduction only on the instances that belong to common classes and avoid costly under-sampling or over-sampling procedures that deal with class imbalances. The third method utilizes SMOTE over-sampling before data reduction. The three methods were tested by conducting experiments on twelve imbalanced datasets. Experimental results reveal high recall and very good reduction rates.\",\"PeriodicalId\":437496,\"journal\":{\"name\":\"2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISA52424.2021.9555514\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA52424.2021.9555514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

当训练数据集或模型的规模很大时,基于实例的分类器会变得效率低下。因此,它们通常与从可用训练数据中收集原型的数据缩减技术一起应用。该原型集称为压缩集,其优点是分类时计算成本低,同时不影响准确率。在训练数据不平衡的情况下,为少数(罕见)类收集的原型数量可能不足。更糟糕的是,稀有的职业可能会被淘汰。本文提出了在应用数据约简时保留稀有类的三种方法。其中两种方法仅对属于常见类的实例应用数据缩减,避免了处理类不平衡的代价高昂的欠采样或过采样过程。第三种方法在数据缩减之前利用SMOTE过采样。在12个不平衡数据集上对这三种方法进行了实验验证。实验结果表明,该方法具有较高的召回率和较好的还原率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Prototype Selection and Generation with Minority Classes Preservation
Instance-based classifiers become inefficient when the size of their training dataset or model is large. Therefore, they are usually applied in conjunction with a Data Reduction Technique that collects prototypes from the available training data. The set of prototypes is called the condensing set and has the benefit of low computational cost during classification, while, at the same time, accuracy is not negatively affected. In case of imbalanced training data, the number of prototypes collected for the minority (rare) classes may be insufficient. Even worse, the rare classes may be eliminated. This paper presents three methods that preserve the rare classes when data reduction is applied. Two of the methods apply data reduction only on the instances that belong to common classes and avoid costly under-sampling or over-sampling procedures that deal with class imbalances. The third method utilizes SMOTE over-sampling before data reduction. The three methods were tested by conducting experiments on twelve imbalanced datasets. Experimental results reveal high recall and very good reduction rates.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信