Romero F. A. B. de Morais, P. Miranda, Ricardo Martins
{"title":"不平衡数据集欠采样选择的元学习方法","authors":"Romero F. A. B. de Morais, P. Miranda, Ricardo Martins","doi":"10.1109/BRACIS.2016.076","DOIUrl":null,"url":null,"abstract":"Imbalanced data sets originating from real world problems, such as medical diagnosis, can be found pervasive. Learning from imbalanced data sets poses its own challenges, as common classifiers assume a balanced distribution of examples' classes in the data. Sampling techniques overcome the imbalance in the data by modifying the examples' classes distribution. Unfortunately, selecting a sampling technique together with its parameters is still an open problem. Current solutions include the brute-force approach (try as many techniques as possible), and the random search approach (choose the most appropriate from a random subset of techniques). In this work, we propose a new method to select sampling techniques for imbalanced data sets. It uses Meta-Learning and works by recommending a technique for an imbalanced data set based on solutions to previous problems. Our experimentation compared the proposed method against the brute-force approach, all techniques with their default parameters, and the random search approach. The results of our experimentation show that the proposed method is comparable to the brute-force approach, outperforms the techniques with their default parameters most of the time, and always surpasses the random search approach.","PeriodicalId":183149,"journal":{"name":"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Meta-Learning Method to Select Under-Sampling Algorithms for Imbalanced Data Sets\",\"authors\":\"Romero F. A. B. de Morais, P. Miranda, Ricardo Martins\",\"doi\":\"10.1109/BRACIS.2016.076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imbalanced data sets originating from real world problems, such as medical diagnosis, can be found pervasive. Learning from imbalanced data sets poses its own challenges, as common classifiers assume a balanced distribution of examples' classes in the data. Sampling techniques overcome the imbalance in the data by modifying the examples' classes distribution. Unfortunately, selecting a sampling technique together with its parameters is still an open problem. Current solutions include the brute-force approach (try as many techniques as possible), and the random search approach (choose the most appropriate from a random subset of techniques). In this work, we propose a new method to select sampling techniques for imbalanced data sets. It uses Meta-Learning and works by recommending a technique for an imbalanced data set based on solutions to previous problems. Our experimentation compared the proposed method against the brute-force approach, all techniques with their default parameters, and the random search approach. The results of our experimentation show that the proposed method is comparable to the brute-force approach, outperforms the techniques with their default parameters most of the time, and always surpasses the random search approach.\",\"PeriodicalId\":183149,\"journal\":{\"name\":\"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BRACIS.2016.076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRACIS.2016.076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Meta-Learning Method to Select Under-Sampling Algorithms for Imbalanced Data Sets
Imbalanced data sets originating from real world problems, such as medical diagnosis, can be found pervasive. Learning from imbalanced data sets poses its own challenges, as common classifiers assume a balanced distribution of examples' classes in the data. Sampling techniques overcome the imbalance in the data by modifying the examples' classes distribution. Unfortunately, selecting a sampling technique together with its parameters is still an open problem. Current solutions include the brute-force approach (try as many techniques as possible), and the random search approach (choose the most appropriate from a random subset of techniques). In this work, we propose a new method to select sampling techniques for imbalanced data sets. It uses Meta-Learning and works by recommending a technique for an imbalanced data set based on solutions to previous problems. Our experimentation compared the proposed method against the brute-force approach, all techniques with their default parameters, and the random search approach. The results of our experimentation show that the proposed method is comparable to the brute-force approach, outperforms the techniques with their default parameters most of the time, and always surpasses the random search approach.