{"title":"An implementation of a multivariate discretization for supervised learning using Forestdisc","authors":"Maissae Haddouchi, A. Berrado","doi":"10.1145/3419604.3419772","DOIUrl":null,"url":null,"abstract":"Discretization is a key pre-processing step in Machine Learning that transforms continuous attributes into discrete ones, through different methods available in the literature. In this regard, this work provides the ForestDisc framework that discretizes data based on a supervised, multivariate and hybrid approach. It uses, at first, a splitting process relying on a tree learning ensemble to generate a large set of cut points. It then uses a merging process based on moment matching optimization, to transform this set into a reduced and representative one. ForestDisc is a non-parametric discretizer in the sense that it does not require the user to introduce any initial setting parameters. We implemented ForestDisc algorithm in the \"ForestDisc\" R package.","PeriodicalId":250715,"journal":{"name":"Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3419604.3419772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Discretization is a key pre-processing step in Machine Learning that transforms continuous attributes into discrete ones, through different methods available in the literature. In this regard, this work provides the ForestDisc framework that discretizes data based on a supervised, multivariate and hybrid approach. It uses, at first, a splitting process relying on a tree learning ensemble to generate a large set of cut points. It then uses a merging process based on moment matching optimization, to transform this set into a reduced and representative one. ForestDisc is a non-parametric discretizer in the sense that it does not require the user to introduce any initial setting parameters. We implemented ForestDisc algorithm in the "ForestDisc" R package.