{"title":"Comparison of Controlled Undersampling Methods for Machine Learning","authors":"Jiříy Setinský, Martin Žádník","doi":"10.1109/ACDSA59508.2024.10467755","DOIUrl":null,"url":null,"abstract":"Data reduction is an important preprocessing operation for Machine Learning to learn from large datasets, especially in the case of applications requiring online learning using constrained resources. Our survey focuses on a specific family of data reduction methods - controlled undersampling methods. We observe the behaviour of the methods as they cooperate with several supervised machine-learning techniques over multiple evaluation datasets. Our results show that the random undersampling method offers surprisingly good results compared to more complex methods and is a good fit for online and resource-sensitive machine-learning applications.","PeriodicalId":518964,"journal":{"name":"2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA)","volume":"55 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACDSA59508.2024.10467755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data reduction is an important preprocessing operation for Machine Learning to learn from large datasets, especially in the case of applications requiring online learning using constrained resources. Our survey focuses on a specific family of data reduction methods - controlled undersampling methods. We observe the behaviour of the methods as they cooperate with several supervised machine-learning techniques over multiple evaluation datasets. Our results show that the random undersampling method offers surprisingly good results compared to more complex methods and is a good fit for online and resource-sensitive machine-learning applications.