{"title":"A preordonance-based decision tree method and its parallel implementation in the framework of Map-Reduce","authors":"Hasna Chamlal , Fadwa Aaboub , Tayeb Ouaderhman","doi":"10.1016/j.asoc.2024.112261","DOIUrl":null,"url":null,"abstract":"<div><div>In supervised classification, decision trees are one of the most popular learning algorithms that are employed in many practical applications because of their simplicity, adaptability, and other perks. The development of effective and efficient decision trees remains a major focus in machine learning. Therefore, the scientific literature provides various node splitting measurements that can be utilized to produce different decision trees, including Information Gain, Gain Ratio, Average Gain, and Gini Index. This research paper presents a new node splitting metric that is based on preordonance theory. The primary benefit of the new split criterion is its ability to deal with categorical or numerical attributes directly without discretization. Consequently, the Preordonance-based decision tree” (P-Tree) approach, a powerful technique that generates decision trees using the suggested node splitting measure, is developed. Both multiclass classification problems and imbalanced data sets can be handled by the P-Tree decision tree strategy. Moreover, the over-partitioning problem is addressed by the P-Tree methodology, which introduces a threshold <span><math><mi>ϵ</mi></math></span> as a stopping condition. If the percentage of instances in a node falls below the predetermined threshold, the expansion of the tree will be halted. The performance of the P-Tree procedure is evaluated on fourteen benchmark data sets with different sizes and contrasted with that of five already existing decision tree methods using a variety of evaluation metrics. The results of the experiments demonstrate that the P-Tree model performs admirably across all of the tested data sets and that it is comparable to the other five decision tree algorithms overall. On the other hand, an ensemble technique called “ensemble P-Tree” offers a reliable remedy to mitigate the instability that is frequently associated with tree-based algorithms. This ensemble method leverages the strengths of the P-Tree approach to enhance predictive performance through collective decision-making. The ensemble P-Tree strategy is comprehensively evaluated by comparing its performance to that of two top-performing ensemble decision tree methodologies. The experimental findings highlight its exceptional performance and competitiveness against other decision tree procedures. Despite the excellent performance of the P-Tree approach, there are still some obstacles that prevent it from handling larger data sets, such as memory restrictions, time complexity, or data complexity. However, parallel computing is effective in resolving this kind of problem. Hence, the MR-P-Tree decision tree technique, a parallel implementation of the P-Tree strategy in the Map-Reduce framework, is further designed. The three parallel procedures MR-SA-S, MR-SP-S, and MR-S-DS for choosing the optimal splitting attributes, choosing the optimal splitting points, and dividing the training data set in parallel, respectively, are the primary basis of the MR-P-Tree methodology. Furthermore, several experimental studies are carried out on ten additional data sets to illustrate the viability of the MR-P-Tree technique and its strong parallel performance.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624010354","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In supervised classification, decision trees are one of the most popular learning algorithms that are employed in many practical applications because of their simplicity, adaptability, and other perks. The development of effective and efficient decision trees remains a major focus in machine learning. Therefore, the scientific literature provides various node splitting measurements that can be utilized to produce different decision trees, including Information Gain, Gain Ratio, Average Gain, and Gini Index. This research paper presents a new node splitting metric that is based on preordonance theory. The primary benefit of the new split criterion is its ability to deal with categorical or numerical attributes directly without discretization. Consequently, the Preordonance-based decision tree” (P-Tree) approach, a powerful technique that generates decision trees using the suggested node splitting measure, is developed. Both multiclass classification problems and imbalanced data sets can be handled by the P-Tree decision tree strategy. Moreover, the over-partitioning problem is addressed by the P-Tree methodology, which introduces a threshold as a stopping condition. If the percentage of instances in a node falls below the predetermined threshold, the expansion of the tree will be halted. The performance of the P-Tree procedure is evaluated on fourteen benchmark data sets with different sizes and contrasted with that of five already existing decision tree methods using a variety of evaluation metrics. The results of the experiments demonstrate that the P-Tree model performs admirably across all of the tested data sets and that it is comparable to the other five decision tree algorithms overall. On the other hand, an ensemble technique called “ensemble P-Tree” offers a reliable remedy to mitigate the instability that is frequently associated with tree-based algorithms. This ensemble method leverages the strengths of the P-Tree approach to enhance predictive performance through collective decision-making. The ensemble P-Tree strategy is comprehensively evaluated by comparing its performance to that of two top-performing ensemble decision tree methodologies. The experimental findings highlight its exceptional performance and competitiveness against other decision tree procedures. Despite the excellent performance of the P-Tree approach, there are still some obstacles that prevent it from handling larger data sets, such as memory restrictions, time complexity, or data complexity. However, parallel computing is effective in resolving this kind of problem. Hence, the MR-P-Tree decision tree technique, a parallel implementation of the P-Tree strategy in the Map-Reduce framework, is further designed. The three parallel procedures MR-SA-S, MR-SP-S, and MR-S-DS for choosing the optimal splitting attributes, choosing the optimal splitting points, and dividing the training data set in parallel, respectively, are the primary basis of the MR-P-Tree methodology. Furthermore, several experimental studies are carried out on ten additional data sets to illustrate the viability of the MR-P-Tree technique and its strong parallel performance.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.