Hiroki Nakahara, Akira Jinguji, S. Sato, Tsutomu Sasao
{"title":"A Random Forest Using a Multi-valued Decision Diagram on an FPGA","authors":"Hiroki Nakahara, Akira Jinguji, S. Sato, Tsutomu Sasao","doi":"10.1109/ISMVL.2017.40","DOIUrl":null,"url":null,"abstract":"A random forest (RF) is a kind of an ensemblemachine learning algorithm used for a classification and aregression. It consists of multiple decision trees that are built fromrandomly sampled data. The RF has a simple, fast learning, andidentification capability compared with other machine learningalgorithms. It is widely used for various recognition systems. Theconventional RF consisted of binary decision trees (BDTs), whilein this paper, we used a multi-valued decision diagrams (MDDs). In the MDD, each variable appears only once on a path, however, in the BDT, some variable may appear multiple times. Sincethe path length is short in the MDD, it can be evaluated at ahigh speed. The disadvantage is that the number of nodes inthe MDD increases with O(2N), where N denotes the numberof input variables. Fortunately, random forests encourage to usethe small number of N for each tree in order to avoid overfitting. Therefore, in several data sets used in the experimental, the number of nodes did not increase even if the MDD wasused. To reduce the development time, the Altera SDK forOpenCL (AOCL), a kind of a high-level synthesis tool, was used. To accelerate the RF classification using the AOCL, we proposethe fully pipelined architecture to increase the memory bandwidthusing on-chip memories on the FPGA. Also, we apply optimalprecision fixed point representation instead of 32 bit floating pointone. We compared the performance with the CPU and the GPUimplementations. As for the LPS (lookups per second), the FPGArealization was 10.7 times faster than the GPU one, and it was14.0 times faster than the CPU one. As for the LPS per powerconsumption, the FPGA realization was 61.3 times better thanthe GPU one, and it was 12.1 times better than the CPU one.","PeriodicalId":393724,"journal":{"name":"2017 IEEE 47th International Symposium on Multiple-Valued Logic (ISMVL)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 47th International Symposium on Multiple-Valued Logic (ISMVL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISMVL.2017.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
A random forest (RF) is a kind of an ensemblemachine learning algorithm used for a classification and aregression. It consists of multiple decision trees that are built fromrandomly sampled data. The RF has a simple, fast learning, andidentification capability compared with other machine learningalgorithms. It is widely used for various recognition systems. Theconventional RF consisted of binary decision trees (BDTs), whilein this paper, we used a multi-valued decision diagrams (MDDs). In the MDD, each variable appears only once on a path, however, in the BDT, some variable may appear multiple times. Sincethe path length is short in the MDD, it can be evaluated at ahigh speed. The disadvantage is that the number of nodes inthe MDD increases with O(2N), where N denotes the numberof input variables. Fortunately, random forests encourage to usethe small number of N for each tree in order to avoid overfitting. Therefore, in several data sets used in the experimental, the number of nodes did not increase even if the MDD wasused. To reduce the development time, the Altera SDK forOpenCL (AOCL), a kind of a high-level synthesis tool, was used. To accelerate the RF classification using the AOCL, we proposethe fully pipelined architecture to increase the memory bandwidthusing on-chip memories on the FPGA. Also, we apply optimalprecision fixed point representation instead of 32 bit floating pointone. We compared the performance with the CPU and the GPUimplementations. As for the LPS (lookups per second), the FPGArealization was 10.7 times faster than the GPU one, and it was14.0 times faster than the CPU one. As for the LPS per powerconsumption, the FPGA realization was 61.3 times better thanthe GPU one, and it was 12.1 times better than the CPU one.