SMOTE-Out、SMOTE-Cosine和Selected-SMOTE:一种处理数据级别不平衡的增强策略

Fajri Koto
{"title":"SMOTE-Out、SMOTE-Cosine和Selected-SMOTE:一种处理数据级别不平衡的增强策略","authors":"Fajri Koto","doi":"10.1109/ICACSIS.2014.7065849","DOIUrl":null,"url":null,"abstract":"The imbalanced dataset often becomes obstacle in supervised learning process. Imbalance is case in which the example in training data belonging to one class is heavily outnumber the examples in the other class. Applying classifier to this dataset results in the failure of classifier to learn the minority class. Synthetic Minority Oversampling Technique (SMOTE) is a well known over-sampling method that tackles imbalance in data level. SMOTE creates synthetic example between two close vectors that lay together. Our study considers three improvements of SMOTE and call them as SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE, in order to cover cases which are not already done by SMOTE. To investigate the proposed method, our experiments were conducted with eighteen different datasets. The results show that our proposed SMOTE give some improvements of B-ACC and F1-Score.","PeriodicalId":443250,"journal":{"name":"2014 International Conference on Advanced Computer Science and Information System","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An enhancement strategy to handle imbalance in data level\",\"authors\":\"Fajri Koto\",\"doi\":\"10.1109/ICACSIS.2014.7065849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The imbalanced dataset often becomes obstacle in supervised learning process. Imbalance is case in which the example in training data belonging to one class is heavily outnumber the examples in the other class. Applying classifier to this dataset results in the failure of classifier to learn the minority class. Synthetic Minority Oversampling Technique (SMOTE) is a well known over-sampling method that tackles imbalance in data level. SMOTE creates synthetic example between two close vectors that lay together. Our study considers three improvements of SMOTE and call them as SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE, in order to cover cases which are not already done by SMOTE. To investigate the proposed method, our experiments were conducted with eighteen different datasets. The results show that our proposed SMOTE give some improvements of B-ACC and F1-Score.\",\"PeriodicalId\":443250,\"journal\":{\"name\":\"2014 International Conference on Advanced Computer Science and Information System\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Advanced Computer Science and Information System\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACSIS.2014.7065849\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Advanced Computer Science and Information System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACSIS.2014.7065849","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

摘要

不平衡数据集经常成为监督学习过程中的障碍。失衡是指训练数据中属于一个类的样本数量远远超过另一个类的样本数量。将分类器应用于该数据集导致分类器无法学习少数类。合成少数派过采样技术(SMOTE)是一种解决数据层次不平衡的过采样方法。SMOTE在两个靠近的向量之间创建了一个合成的例子。我们的研究考虑了SMOTE的三种改进,并将它们称为SMOTE- out, SMOTE- cosine和Selected-SMOTE,以涵盖SMOTE尚未完成的情况。为了验证所提出的方法,我们在18个不同的数据集上进行了实验。结果表明,我们提出的SMOTE对B-ACC和F1-Score有一定的改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An enhancement strategy to handle imbalance in data level
The imbalanced dataset often becomes obstacle in supervised learning process. Imbalance is case in which the example in training data belonging to one class is heavily outnumber the examples in the other class. Applying classifier to this dataset results in the failure of classifier to learn the minority class. Synthetic Minority Oversampling Technique (SMOTE) is a well known over-sampling method that tackles imbalance in data level. SMOTE creates synthetic example between two close vectors that lay together. Our study considers three improvements of SMOTE and call them as SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE, in order to cover cases which are not already done by SMOTE. To investigate the proposed method, our experiments were conducted with eighteen different datasets. The results show that our proposed SMOTE give some improvements of B-ACC and F1-Score.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信