基于FPGA的在线决策树学习的高效可扩展加速研究

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI:10.1109/FCCM.2019.00032

Zhe Lin, Sharad Sinha, Wei Zhang

{"title":"基于FPGA的在线决策树学习的高效可扩展加速研究","authors":"Zhe Lin, Sharad Sinha, Wei Zhang","doi":"10.1109/FCCM.2019.00032","DOIUrl":null,"url":null,"abstract":"Decision trees are machine learning models commonly used in various application scenarios. In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets due to their stringent data storage requirement. Online decision tree learning algorithms have been devised to tackle this problem by concurrently training with incoming samples and providing inference results. However, even the most up-to-date online tree learning algorithms still suffer from either high memory usage or high computational intensity with dependency and long latency, making them challenging to implement in hardware. To overcome these difficulties, we introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models. The proposed algorithm is light-weight in terms of both memory and computational demand, while still maintaining high generalization ability. A series of optimization techniques dedicated to the proposed algorithm have been investigated from the hardware perspective, including coarse-grained and fine-grained parallelism, dynamic and memory-based resource sharing, pipelining with data forwarding. We further present a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array (FPGA) with system-level optimization techniques. Experimental results show that our proposed algorithm outperforms the state-of-the-art Hoeffding tree learning method, leading to 0.05% to 12.3% improvement in inference accuracy. Real implementation of the complete learning system on the FPGA demonstrates a 384x to 1581x speedup in execution time over the state-of-the-art design.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA\",\"authors\":\"Zhe Lin, Sharad Sinha, Wei Zhang\",\"doi\":\"10.1109/FCCM.2019.00032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Decision trees are machine learning models commonly used in various application scenarios. In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets due to their stringent data storage requirement. Online decision tree learning algorithms have been devised to tackle this problem by concurrently training with incoming samples and providing inference results. However, even the most up-to-date online tree learning algorithms still suffer from either high memory usage or high computational intensity with dependency and long latency, making them challenging to implement in hardware. To overcome these difficulties, we introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models. The proposed algorithm is light-weight in terms of both memory and computational demand, while still maintaining high generalization ability. A series of optimization techniques dedicated to the proposed algorithm have been investigated from the hardware perspective, including coarse-grained and fine-grained parallelism, dynamic and memory-based resource sharing, pipelining with data forwarding. We further present a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array (FPGA) with system-level optimization techniques. Experimental results show that our proposed algorithm outperforms the state-of-the-art Hoeffding tree learning method, leading to 0.05% to 12.3% improvement in inference accuracy. Real implementation of the complete learning system on the FPGA demonstrates a 384x to 1581x speedup in execution time over the state-of-the-art design.\",\"PeriodicalId\":116955,\"journal\":{\"name\":\"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2019.00032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2019.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

决策树是在各种应用场景中常用的机器学习模型。在大数据时代，传统的决策树归纳算法对数据存储要求严格，不适合学习大规模数据集。在线决策树学习算法通过对输入样本进行并行训练并提供推理结果来解决这一问题。然而，即使是最新的在线树学习算法仍然存在高内存使用或高计算强度、依赖性和长延迟的问题，这使得它们很难在硬件上实现。为了克服这些困难，我们引入了一种新的基于分位数的算法来改进Hoeffding树的归纳，Hoeffding树是最先进的在线学习模型之一。该算法在内存和计算量方面都是轻量级的，同时仍保持较高的泛化能力。从硬件角度研究了该算法的一系列优化技术，包括粗粒度和细粒度并行、动态和基于内存的资源共享、数据转发的流水线。我们进一步提出了一个高性能，硬件高效和可扩展的在线决策树学习系统在现场可编程门阵列(FPGA)与系统级优化技术。实验结果表明，我们提出的算法优于最先进的Hoeffding树学习方法，推理准确率提高了0.05%至12.3%。完整的学习系统在FPGA上的实际实现表明，与最先进的设计相比，执行时间加快了384倍到1581倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA

Decision trees are machine learning models commonly used in various application scenarios. In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets due to their stringent data storage requirement. Online decision tree learning algorithms have been devised to tackle this problem by concurrently training with incoming samples and providing inference results. However, even the most up-to-date online tree learning algorithms still suffer from either high memory usage or high computational intensity with dependency and long latency, making them challenging to implement in hardware. To overcome these difficulties, we introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models. The proposed algorithm is light-weight in terms of both memory and computational demand, while still maintaining high generalization ability. A series of optimization techniques dedicated to the proposed algorithm have been investigated from the hardware perspective, including coarse-grained and fine-grained parallelism, dynamic and memory-based resource sharing, pipelining with data forwarding. We further present a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array (FPGA) with system-level optimization techniques. Experimental results show that our proposed algorithm outperforms the state-of-the-art Hoeffding tree learning method, leading to 0.05% to 12.3% improvement in inference accuracy. Real implementation of the complete learning system on the FPGA demonstrates a 384x to 1581x speedup in execution time over the state-of-the-art design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量