{"title":"不平衡数据分类的数据级方法综述","authors":"Bahareh Nikpour , Farshad Rahmati , Behzad Mirzaei , Hossein Nezamabadi-pour","doi":"10.1016/j.eswa.2025.128920","DOIUrl":null,"url":null,"abstract":"<div><div>Classification is one of the most important tasks in machine learning and data mining. Most of the classifiers are designed for data sets with equally distributed samples among the classes. Therefore, they encounter a problem with classifying imbalanced data in which one or more classes have much fewer samples than the others. Imbalanced data sets are prevalent in the real-world, so addressing this issue is of utmost importance. There have been many methods suggested to solve this problem showing promising results, a category of which is <em>data-level</em> methods being popular for their flexibility. In this paper, our goal is to review data-level methods comprehensively and categorize them from different perspectives. Also, to simplify doing future research in this field, most of the available benchmark imbalanced data sets, software, and toolboxes are introduced. Finally, existing challenges and future works are elaborated.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"295 ","pages":"Article 128920"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comprehensive review on data-level methods for imbalanced data classification\",\"authors\":\"Bahareh Nikpour , Farshad Rahmati , Behzad Mirzaei , Hossein Nezamabadi-pour\",\"doi\":\"10.1016/j.eswa.2025.128920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Classification is one of the most important tasks in machine learning and data mining. Most of the classifiers are designed for data sets with equally distributed samples among the classes. Therefore, they encounter a problem with classifying imbalanced data in which one or more classes have much fewer samples than the others. Imbalanced data sets are prevalent in the real-world, so addressing this issue is of utmost importance. There have been many methods suggested to solve this problem showing promising results, a category of which is <em>data-level</em> methods being popular for their flexibility. In this paper, our goal is to review data-level methods comprehensively and categorize them from different perspectives. Also, to simplify doing future research in this field, most of the available benchmark imbalanced data sets, software, and toolboxes are introduced. Finally, existing challenges and future works are elaborated.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"295 \",\"pages\":\"Article 128920\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425025370\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425025370","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A comprehensive review on data-level methods for imbalanced data classification
Classification is one of the most important tasks in machine learning and data mining. Most of the classifiers are designed for data sets with equally distributed samples among the classes. Therefore, they encounter a problem with classifying imbalanced data in which one or more classes have much fewer samples than the others. Imbalanced data sets are prevalent in the real-world, so addressing this issue is of utmost importance. There have been many methods suggested to solve this problem showing promising results, a category of which is data-level methods being popular for their flexibility. In this paper, our goal is to review data-level methods comprehensively and categorize them from different perspectives. Also, to simplify doing future research in this field, most of the available benchmark imbalanced data sets, software, and toolboxes are introduced. Finally, existing challenges and future works are elaborated.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.