关于失衡大数据拟议解决方案的调查研究

Q4 Earth and Planetary Sciences
S. Razoqi, Ghayda Al-Talib
{"title":"关于失衡大数据拟议解决方案的调查研究","authors":"S. Razoqi, Ghayda Al-Talib","doi":"10.24996/ijs.2024.65.3.37","DOIUrl":null,"url":null,"abstract":"     Learning from imbalanced data has been a focus of studies for more than two decades of continuous development. Training data is considered imbalanced when the size of the positive (minority) class is neglected because of the large size of the negative (majority) class, in addition to the problem of deviating distributions of binary tasks. The appearance of big data brings new problems and challenges to the imbalance problem. Big Data announces the challenges with 5V: volume, velocity, veracity, value, and variety. This study relied on dividing the solution to the problem of data imbalance into three levels: data level, algorithm level, and hybrid approaches. First, the standard solutions for this problem that were proposed were mentioned, and in addition, the most important metrics adopted for measuring the classification efficiency of imbalanced data were identified. In this survey study, 27 studies were reviewed during the period 2015–2022, distributed according to the levels of treatment of the imbalance problem. They also reviewed the performance metrics that were used in these studies and the sources of the datasets to which these solutions were applied. The study makes it easier for researchers and scholars to see the solutions to addressing the problem of data imbalance and the hybrid approaches recently used for that, and to take advantage of them in improving the classification process.","PeriodicalId":14698,"journal":{"name":"Iraqi Journal of Science","volume":"41 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Survey Study on Proposed Solutions for Imbalanced Big Data\",\"authors\":\"S. Razoqi, Ghayda Al-Talib\",\"doi\":\"10.24996/ijs.2024.65.3.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"     Learning from imbalanced data has been a focus of studies for more than two decades of continuous development. Training data is considered imbalanced when the size of the positive (minority) class is neglected because of the large size of the negative (majority) class, in addition to the problem of deviating distributions of binary tasks. The appearance of big data brings new problems and challenges to the imbalance problem. Big Data announces the challenges with 5V: volume, velocity, veracity, value, and variety. This study relied on dividing the solution to the problem of data imbalance into three levels: data level, algorithm level, and hybrid approaches. First, the standard solutions for this problem that were proposed were mentioned, and in addition, the most important metrics adopted for measuring the classification efficiency of imbalanced data were identified. In this survey study, 27 studies were reviewed during the period 2015–2022, distributed according to the levels of treatment of the imbalance problem. They also reviewed the performance metrics that were used in these studies and the sources of the datasets to which these solutions were applied. The study makes it easier for researchers and scholars to see the solutions to addressing the problem of data imbalance and the hybrid approaches recently used for that, and to take advantage of them in improving the classification process.\",\"PeriodicalId\":14698,\"journal\":{\"name\":\"Iraqi Journal of Science\",\"volume\":\"41 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Iraqi Journal of Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24996/ijs.2024.65.3.37\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Earth and Planetary Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iraqi Journal of Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24996/ijs.2024.65.3.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 0

摘要

经过二十多年的不断发展,从不平衡性数据中学习一直是研究的重点。除了二元任务的偏差分布问题外,当正向(少数)类的规模因负向(多数)类的规模大而被忽视时,训练数据就被认为是不平衡的。大数据的出现给不平衡问题带来了新的问题和挑战。大数据用 5V 宣告了挑战:数量、速度、真实性、价值和多样性。本研究将数据不平衡问题的解决方案分为三个层面:数据层面、算法层面和混合方法。首先,提到了针对这一问题提出的标准解决方案,此外,还确定了用于衡量不平衡数据分类效率的最重要指标。在这项调查研究中,对 2015-2022 年间的 27 项研究进行了回顾,这些研究按照处理不平衡问题的级别分布。他们还回顾了这些研究中使用的性能指标以及应用这些解决方案的数据集来源。这项研究使研究人员和学者更容易了解解决数据不平衡问题的方案和最近用于解决该问题的混合方法,并利用它们改进分类过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Survey Study on Proposed Solutions for Imbalanced Big Data
     Learning from imbalanced data has been a focus of studies for more than two decades of continuous development. Training data is considered imbalanced when the size of the positive (minority) class is neglected because of the large size of the negative (majority) class, in addition to the problem of deviating distributions of binary tasks. The appearance of big data brings new problems and challenges to the imbalance problem. Big Data announces the challenges with 5V: volume, velocity, veracity, value, and variety. This study relied on dividing the solution to the problem of data imbalance into three levels: data level, algorithm level, and hybrid approaches. First, the standard solutions for this problem that were proposed were mentioned, and in addition, the most important metrics adopted for measuring the classification efficiency of imbalanced data were identified. In this survey study, 27 studies were reviewed during the period 2015–2022, distributed according to the levels of treatment of the imbalance problem. They also reviewed the performance metrics that were used in these studies and the sources of the datasets to which these solutions were applied. The study makes it easier for researchers and scholars to see the solutions to addressing the problem of data imbalance and the hybrid approaches recently used for that, and to take advantage of them in improving the classification process.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Iraqi Journal of Science
Iraqi Journal of Science Chemistry-Chemistry (all)
CiteScore
1.50
自引率
0.00%
发文量
241
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信