Somayeh Lotfi, Mohammad Ghasemzadeh, M. Mohsenzadeh, M. Mirzarezaee
{"title":"结合模糊分区法和增量法构建大型数据集上的可扩展决策树","authors":"Somayeh Lotfi, Mohammad Ghasemzadeh, M. Mohsenzadeh, M. Mirzarezaee","doi":"10.1142/s0218488523500423","DOIUrl":null,"url":null,"abstract":"The Decision tree algorithm is a very popular classifier for reasoning through recursive partitioning of the data space. To choose the best attributes for splitting, the range of each continuous attribute should be split into two or more intervals. Then partitioning criteria are calculated for each value. Fuzzy partitioning can be used to reduce sensitivity to noise and increase tree stability. Also, tree-building algorithms face memory limitations as they need to keep the entire training dataset in the main memory. In this paper, we introduced a fuzzy decision tree approach based on fuzzy sets. To avoid storing the entire training dataset in the main memory and overcome the memory limitations, the algorithm incrementally builds FDTs. Membership functions are automatically generated. The Fuzzy Information Gain (FIG) is then used as the fast split attribute selection criterion, and leaf expansion is performed only on the instances stored in it. The efficiency of this algorithm is examined in terms of accuracy and tree complexity. The results show that the proposed algorithm can overcome memory limitations and balance accuracy and complexity while reducing the complexity of the tree.","PeriodicalId":507871,"journal":{"name":"International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combining Fuzzy Partitioning and Incremental Methods to Construct a Scalable Decision Tree on Large Datasets\",\"authors\":\"Somayeh Lotfi, Mohammad Ghasemzadeh, M. Mohsenzadeh, M. Mirzarezaee\",\"doi\":\"10.1142/s0218488523500423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Decision tree algorithm is a very popular classifier for reasoning through recursive partitioning of the data space. To choose the best attributes for splitting, the range of each continuous attribute should be split into two or more intervals. Then partitioning criteria are calculated for each value. Fuzzy partitioning can be used to reduce sensitivity to noise and increase tree stability. Also, tree-building algorithms face memory limitations as they need to keep the entire training dataset in the main memory. In this paper, we introduced a fuzzy decision tree approach based on fuzzy sets. To avoid storing the entire training dataset in the main memory and overcome the memory limitations, the algorithm incrementally builds FDTs. Membership functions are automatically generated. The Fuzzy Information Gain (FIG) is then used as the fast split attribute selection criterion, and leaf expansion is performed only on the instances stored in it. The efficiency of this algorithm is examined in terms of accuracy and tree complexity. The results show that the proposed algorithm can overcome memory limitations and balance accuracy and complexity while reducing the complexity of the tree.\",\"PeriodicalId\":507871,\"journal\":{\"name\":\"International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0218488523500423\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0218488523500423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Combining Fuzzy Partitioning and Incremental Methods to Construct a Scalable Decision Tree on Large Datasets
The Decision tree algorithm is a very popular classifier for reasoning through recursive partitioning of the data space. To choose the best attributes for splitting, the range of each continuous attribute should be split into two or more intervals. Then partitioning criteria are calculated for each value. Fuzzy partitioning can be used to reduce sensitivity to noise and increase tree stability. Also, tree-building algorithms face memory limitations as they need to keep the entire training dataset in the main memory. In this paper, we introduced a fuzzy decision tree approach based on fuzzy sets. To avoid storing the entire training dataset in the main memory and overcome the memory limitations, the algorithm incrementally builds FDTs. Membership functions are automatically generated. The Fuzzy Information Gain (FIG) is then used as the fast split attribute selection criterion, and leaf expansion is performed only on the instances stored in it. The efficiency of this algorithm is examined in terms of accuracy and tree complexity. The results show that the proposed algorithm can overcome memory limitations and balance accuracy and complexity while reducing the complexity of the tree.