{"title":"基于遗传规划的规则提取,实现准确的销售预测","authors":"Rikard König, U. Johansson","doi":"10.1109/CIDM.2014.7008669","DOIUrl":null,"url":null,"abstract":"The purpose of this paper is to propose and evaluate a method for reducing the inherent tendency of genetic programming to overfit small and noisy data sets. In addition, the use of different optimization criteria for symbolic regression is demonstrated. The key idea is to reduce the risk of overfitting noise in the training data by introducing an intermediate predictive model in the process. More specifically, instead of directly evolving a genetic regression model based on labeled training data, the first step is to generate a highly accurate ensemble model. Since ensembles are very robust, the resulting predictions will contain less noise than the original data set. In the second step, an interpretable model is evolved, using the ensemble predictions, instead of the true labels, as the target variable. Experiments on 175 sales forecasting data sets, from one of Sweden's largest wholesale companies, show that the proposed technique obtained significantly better predictive performance, compared to both straightforward use of genetic programming and the standard M5P technique. Naturally, the level of improvement depends critically on the performance of the intermediate ensemble.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"13 34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rule extraction using genetic programming for accurate sales forecasting\",\"authors\":\"Rikard König, U. Johansson\",\"doi\":\"10.1109/CIDM.2014.7008669\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The purpose of this paper is to propose and evaluate a method for reducing the inherent tendency of genetic programming to overfit small and noisy data sets. In addition, the use of different optimization criteria for symbolic regression is demonstrated. The key idea is to reduce the risk of overfitting noise in the training data by introducing an intermediate predictive model in the process. More specifically, instead of directly evolving a genetic regression model based on labeled training data, the first step is to generate a highly accurate ensemble model. Since ensembles are very robust, the resulting predictions will contain less noise than the original data set. In the second step, an interpretable model is evolved, using the ensemble predictions, instead of the true labels, as the target variable. Experiments on 175 sales forecasting data sets, from one of Sweden's largest wholesale companies, show that the proposed technique obtained significantly better predictive performance, compared to both straightforward use of genetic programming and the standard M5P technique. Naturally, the level of improvement depends critically on the performance of the intermediate ensemble.\",\"PeriodicalId\":117542,\"journal\":{\"name\":\"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)\",\"volume\":\"13 34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIDM.2014.7008669\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIDM.2014.7008669","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Rule extraction using genetic programming for accurate sales forecasting
The purpose of this paper is to propose and evaluate a method for reducing the inherent tendency of genetic programming to overfit small and noisy data sets. In addition, the use of different optimization criteria for symbolic regression is demonstrated. The key idea is to reduce the risk of overfitting noise in the training data by introducing an intermediate predictive model in the process. More specifically, instead of directly evolving a genetic regression model based on labeled training data, the first step is to generate a highly accurate ensemble model. Since ensembles are very robust, the resulting predictions will contain less noise than the original data set. In the second step, an interpretable model is evolved, using the ensemble predictions, instead of the true labels, as the target variable. Experiments on 175 sales forecasting data sets, from one of Sweden's largest wholesale companies, show that the proposed technique obtained significantly better predictive performance, compared to both straightforward use of genetic programming and the standard M5P technique. Naturally, the level of improvement depends critically on the performance of the intermediate ensemble.