{"title":"Solving Cross-Selling Problems with Ensemble Learning: A Case Study","authors":"X. Guo, Yilong Yin, Guang-Tong Zhou, Cailing Dong","doi":"10.1109/ICACTE.2008.86","DOIUrl":null,"url":null,"abstract":"This paper shows our solution to PAKDD Competition 2007 as a case study of cross-selling problems. Following a brief description of the data mining task, we discuss several difficulties to be confronted with in the task from the view of data mining. Then, we show how to do the data pre-processing. In the solution we proposed, to weaken class imbalance of the modeling dataset externally, we combine under-sampling and over-sampling techniques. Besides, we adjust the parameters of each base learner internally to solve cost-sensitivity. Next, we get an ensemble of base learners to achieve a better predicting performance. Experimental results on prediction dataset of real world provided by PAKDD Competition 2007 show that our solution is effective and efficient with its AUC value 60.73%.","PeriodicalId":364568,"journal":{"name":"2008 International Conference on Advanced Computer Theory and Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Advanced Computer Theory and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACTE.2008.86","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper shows our solution to PAKDD Competition 2007 as a case study of cross-selling problems. Following a brief description of the data mining task, we discuss several difficulties to be confronted with in the task from the view of data mining. Then, we show how to do the data pre-processing. In the solution we proposed, to weaken class imbalance of the modeling dataset externally, we combine under-sampling and over-sampling techniques. Besides, we adjust the parameters of each base learner internally to solve cost-sensitivity. Next, we get an ensemble of base learners to achieve a better predicting performance. Experimental results on prediction dataset of real world provided by PAKDD Competition 2007 show that our solution is effective and efficient with its AUC value 60.73%.