{"title":"全局启动,局部优化,全局预测:提高不平衡数据的性能","authors":"David A. Cieslak, N. Chawla","doi":"10.1109/ICDM.2008.87","DOIUrl":null,"url":null,"abstract":"Class imbalance is a ubiquitous problem in supervised learning and has gained wide-scale attention in the literature. Perhaps the most prevalent solution is to apply sampling to training data in order improve classifier performance. The typical approach will apply uniform levels of sampling globally. However, we believe that data is typically multi-modal, which suggests sampling should be treated locally rather than globally. It is the purpose of this paper to propose a framework which first identifies meaningful regions of data and then proceeds to find optimal sampling levels within each. This paper demonstrates that a global classifier trained on data locally sampled produces superior rank-orderings on a wide range of real-world and artificial datasets as compared to contemporary global sampling methods.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"28 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"73","resultStr":"{\"title\":\"Start Globally, Optimize Locally, Predict Globally: Improving Performance on Imbalanced Data\",\"authors\":\"David A. Cieslak, N. Chawla\",\"doi\":\"10.1109/ICDM.2008.87\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Class imbalance is a ubiquitous problem in supervised learning and has gained wide-scale attention in the literature. Perhaps the most prevalent solution is to apply sampling to training data in order improve classifier performance. The typical approach will apply uniform levels of sampling globally. However, we believe that data is typically multi-modal, which suggests sampling should be treated locally rather than globally. It is the purpose of this paper to propose a framework which first identifies meaningful regions of data and then proceeds to find optimal sampling levels within each. This paper demonstrates that a global classifier trained on data locally sampled produces superior rank-orderings on a wide range of real-world and artificial datasets as compared to contemporary global sampling methods.\",\"PeriodicalId\":252958,\"journal\":{\"name\":\"2008 Eighth IEEE International Conference on Data Mining\",\"volume\":\"28 10\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"73\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Eighth IEEE International Conference on Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2008.87\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Eighth IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2008.87","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Start Globally, Optimize Locally, Predict Globally: Improving Performance on Imbalanced Data
Class imbalance is a ubiquitous problem in supervised learning and has gained wide-scale attention in the literature. Perhaps the most prevalent solution is to apply sampling to training data in order improve classifier performance. The typical approach will apply uniform levels of sampling globally. However, we believe that data is typically multi-modal, which suggests sampling should be treated locally rather than globally. It is the purpose of this paper to propose a framework which first identifies meaningful regions of data and then proceeds to find optimal sampling levels within each. This paper demonstrates that a global classifier trained on data locally sampled produces superior rank-orderings on a wide range of real-world and artificial datasets as compared to contemporary global sampling methods.