{"title":"针对用户流失的C4.5算法优化研究","authors":"Chao Deng, Zhaohui Ma","doi":"10.1109/CSAIEE54046.2021.9543367","DOIUrl":null,"url":null,"abstract":"Decision tree is a kind of machine learning method which can decide a corresponding result according to the probability of different eigenvalues, The effective decision number constructed can provide help for our data analysis. The generation of decision tree is a recursive process, It mainly uses the optimal partition attribute as the corresponding tree node, and then uses various values of the attribute to construct branches. In this way, until the data reaches a certain purity, the leaf nodes are obtained, and a decision tree in accordance with the rules is constructed. Among the traditional decision tree algorithms, C4.5 algorithm has a gain rate because of its attribute division. This leads to another obvious disadvantage, that is, it has a preference for the attributes with a small number of values, so that the accuracy of the decision tree is often not particularly ideal. In view of this, this paper proposes an improved E-C4.5 algorithm, which combines information gain and information gain rate to generate a new attribute partition criterion. The attribute partition method greatly eliminates the shortcoming of C4.5 algorithm which has a preference for the attributes with a small number of values, and further improves the decision accuracy of decision tree generation. In this paper, the actual data sets are used to verify the accuracy of the decision tree generated by the improved algorithm compared with the traditional C4.5 algorithm.","PeriodicalId":376014,"journal":{"name":"2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Research on C4.5 Algorithm Optimization For User Churn\",\"authors\":\"Chao Deng, Zhaohui Ma\",\"doi\":\"10.1109/CSAIEE54046.2021.9543367\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Decision tree is a kind of machine learning method which can decide a corresponding result according to the probability of different eigenvalues, The effective decision number constructed can provide help for our data analysis. The generation of decision tree is a recursive process, It mainly uses the optimal partition attribute as the corresponding tree node, and then uses various values of the attribute to construct branches. In this way, until the data reaches a certain purity, the leaf nodes are obtained, and a decision tree in accordance with the rules is constructed. Among the traditional decision tree algorithms, C4.5 algorithm has a gain rate because of its attribute division. This leads to another obvious disadvantage, that is, it has a preference for the attributes with a small number of values, so that the accuracy of the decision tree is often not particularly ideal. In view of this, this paper proposes an improved E-C4.5 algorithm, which combines information gain and information gain rate to generate a new attribute partition criterion. The attribute partition method greatly eliminates the shortcoming of C4.5 algorithm which has a preference for the attributes with a small number of values, and further improves the decision accuracy of decision tree generation. In this paper, the actual data sets are used to verify the accuracy of the decision tree generated by the improved algorithm compared with the traditional C4.5 algorithm.\",\"PeriodicalId\":376014,\"journal\":{\"name\":\"2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSAIEE54046.2021.9543367\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSAIEE54046.2021.9543367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on C4.5 Algorithm Optimization For User Churn
Decision tree is a kind of machine learning method which can decide a corresponding result according to the probability of different eigenvalues, The effective decision number constructed can provide help for our data analysis. The generation of decision tree is a recursive process, It mainly uses the optimal partition attribute as the corresponding tree node, and then uses various values of the attribute to construct branches. In this way, until the data reaches a certain purity, the leaf nodes are obtained, and a decision tree in accordance with the rules is constructed. Among the traditional decision tree algorithms, C4.5 algorithm has a gain rate because of its attribute division. This leads to another obvious disadvantage, that is, it has a preference for the attributes with a small number of values, so that the accuracy of the decision tree is often not particularly ideal. In view of this, this paper proposes an improved E-C4.5 algorithm, which combines information gain and information gain rate to generate a new attribute partition criterion. The attribute partition method greatly eliminates the shortcoming of C4.5 algorithm which has a preference for the attributes with a small number of values, and further improves the decision accuracy of decision tree generation. In this paper, the actual data sets are used to verify the accuracy of the decision tree generated by the improved algorithm compared with the traditional C4.5 algorithm.