SMOTE-Out、SMOTE-Cosine和Selected-SMOTE:一种处理数据级别不平衡的增强策略

2014 International Conference on Advanced Computer Science and Information System Pub Date : 2014-10-01 DOI:10.1109/ICACSIS.2014.7065849

Fajri Koto

{"title":"SMOTE-Out、SMOTE-Cosine和Selected-SMOTE:一种处理数据级别不平衡的增强策略","authors":"Fajri Koto","doi":"10.1109/ICACSIS.2014.7065849","DOIUrl":null,"url":null,"abstract":"The imbalanced dataset often becomes obstacle in supervised learning process. Imbalance is case in which the example in training data belonging to one class is heavily outnumber the examples in the other class. Applying classifier to this dataset results in the failure of classifier to learn the minority class. Synthetic Minority Oversampling Technique (SMOTE) is a well known over-sampling method that tackles imbalance in data level. SMOTE creates synthetic example between two close vectors that lay together. Our study considers three improvements of SMOTE and call them as SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE, in order to cover cases which are not already done by SMOTE. To investigate the proposed method, our experiments were conducted with eighteen different datasets. The results show that our proposed SMOTE give some improvements of B-ACC and F1-Score.","PeriodicalId":443250,"journal":{"name":"2014 International Conference on Advanced Computer Science and Information System","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An enhancement strategy to handle imbalance in data level\",\"authors\":\"Fajri Koto\",\"doi\":\"10.1109/ICACSIS.2014.7065849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The imbalanced dataset often becomes obstacle in supervised learning process. Imbalance is case in which the example in training data belonging to one class is heavily outnumber the examples in the other class. Applying classifier to this dataset results in the failure of classifier to learn the minority class. Synthetic Minority Oversampling Technique (SMOTE) is a well known over-sampling method that tackles imbalance in data level. SMOTE creates synthetic example between two close vectors that lay together. Our study considers three improvements of SMOTE and call them as SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE, in order to cover cases which are not already done by SMOTE. To investigate the proposed method, our experiments were conducted with eighteen different datasets. The results show that our proposed SMOTE give some improvements of B-ACC and F1-Score.\",\"PeriodicalId\":443250,\"journal\":{\"name\":\"2014 International Conference on Advanced Computer Science and Information System\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Advanced Computer Science and Information System\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACSIS.2014.7065849\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Advanced Computer Science and Information System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACSIS.2014.7065849","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

摘要

不平衡数据集经常成为监督学习过程中的障碍。失衡是指训练数据中属于一个类的样本数量远远超过另一个类的样本数量。将分类器应用于该数据集导致分类器无法学习少数类。合成少数派过采样技术(SMOTE)是一种解决数据层次不平衡的过采样方法。SMOTE在两个靠近的向量之间创建了一个合成的例子。我们的研究考虑了SMOTE的三种改进，并将它们称为SMOTE- out, SMOTE- cosine和Selected-SMOTE，以涵盖SMOTE尚未完成的情况。为了验证所提出的方法，我们在18个不同的数据集上进行了实验。结果表明，我们提出的SMOTE对B-ACC和F1-Score有一定的改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An enhancement strategy to handle imbalance in data level

The imbalanced dataset often becomes obstacle in supervised learning process. Imbalance is case in which the example in training data belonging to one class is heavily outnumber the examples in the other class. Applying classifier to this dataset results in the failure of classifier to learn the minority class. Synthetic Minority Oversampling Technique (SMOTE) is a well known over-sampling method that tackles imbalance in data level. SMOTE creates synthetic example between two close vectors that lay together. Our study considers three improvements of SMOTE and call them as SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE, in order to cover cases which are not already done by SMOTE. To investigate the proposed method, our experiments were conducted with eighteen different datasets. The results show that our proposed SMOTE give some improvements of B-ACC and F1-Score.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on Advanced Computer Science and Information System

自引率

0.00%

发文量