用于快速梯度增强树训练的可扩展硬件架构

Q4 Engineering

IPSJ Transactions on System LSI Design Methodology Pub Date : 2021-01-01 DOI:10.2197/ipsjtsldm.14.11

Tamon Sadasue, Takuya Tanaka, Ryosuke Kasahara, Arief Darmawan, T. Isshiki

{"title":"用于快速梯度增强树训练的可扩展硬件架构","authors":"Tamon Sadasue, Takuya Tanaka, Ryosuke Kasahara, Arief Darmawan, T. Isshiki","doi":"10.2197/ipsjtsldm.14.11","DOIUrl":null,"url":null,"abstract":": Gradient Boosted Tree is a powerful machine learning method that supports both classiﬁcation and regres- sion, and is widely used in ﬁelds requiring high-precision prediction, particularly for various types of tabular data sets. Owing to the recent increase in data size, the number of attributes, and the demand for frequent model updates, a fast and e ﬃ cient training is required. FPGA is suitable for acceleration with power e ﬃ ciency because it can realize a domain speciﬁc hardware architecture; however it is necessary to ﬂexibly support many hyper-parameters to adapt to various dataset sizes, dataset properties, and system limitations such as memory capacity and logic capacity. We introduce a fully pipelined hardware implementation of Gradient Boosted Tree training and a design framework that enables a versatile hardware system description with high performance and ﬂexibility to realize highly parameterized machine learning models. Experimental results show that our FPGA implementation achieves a 11- to 33-times faster performance and more than 300-times higher power e ﬃ ciency than a state-of-the-art GPU accelerated software implementation.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"23 1","pages":"11-20"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalable Hardware Architecture for fast Gradient Boosted Tree Training\",\"authors\":\"Tamon Sadasue, Takuya Tanaka, Ryosuke Kasahara, Arief Darmawan, T. Isshiki\",\"doi\":\"10.2197/ipsjtsldm.14.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Gradient Boosted Tree is a powerful machine learning method that supports both classiﬁcation and regres- sion, and is widely used in ﬁelds requiring high-precision prediction, particularly for various types of tabular data sets. Owing to the recent increase in data size, the number of attributes, and the demand for frequent model updates, a fast and e ﬃ cient training is required. FPGA is suitable for acceleration with power e ﬃ ciency because it can realize a domain speciﬁc hardware architecture; however it is necessary to ﬂexibly support many hyper-parameters to adapt to various dataset sizes, dataset properties, and system limitations such as memory capacity and logic capacity. We introduce a fully pipelined hardware implementation of Gradient Boosted Tree training and a design framework that enables a versatile hardware system description with high performance and ﬂexibility to realize highly parameterized machine learning models. Experimental results show that our FPGA implementation achieves a 11- to 33-times faster performance and more than 300-times higher power e ﬃ ciency than a state-of-the-art GPU accelerated software implementation.\",\"PeriodicalId\":38964,\"journal\":{\"name\":\"IPSJ Transactions on System LSI Design Methodology\",\"volume\":\"23 1\",\"pages\":\"11-20\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IPSJ Transactions on System LSI Design Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2197/ipsjtsldm.14.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSJ Transactions on System LSI Design Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/ipsjtsldm.14.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 0

摘要

梯度提升树是一种强大的机器学习方法，支持分类和回归，广泛应用于需要高精度预测的领域，特别是各种类型的表格数据集。由于最近数据大小、属性数量的增加以及对频繁模型更新的需求，需要快速有效的训练。FPGA可以实现特定领域的硬件架构，适合于功率效率的加速;然而，有必要灵活地支持许多超参数，以适应不同的数据集大小、数据集属性和系统限制，如内存容量和逻辑容量。我们介绍了一个完全流水线的梯度增强树训练硬件实现和一个设计框架，该框架能够实现高性能和灵活性的通用硬件系统描述，以实现高度参数化的机器学习模型。实验结果表明，我们的FPGA实现实现了比最先进的GPU加速软件实现快11到33倍的性能和300倍以上的功率效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable Hardware Architecture for fast Gradient Boosted Tree Training

: Gradient Boosted Tree is a powerful machine learning method that supports both classiﬁcation and regres- sion, and is widely used in ﬁelds requiring high-precision prediction, particularly for various types of tabular data sets. Owing to the recent increase in data size, the number of attributes, and the demand for frequent model updates, a fast and e ﬃ cient training is required. FPGA is suitable for acceleration with power e ﬃ ciency because it can realize a domain speciﬁc hardware architecture; however it is necessary to ﬂexibly support many hyper-parameters to adapt to various dataset sizes, dataset properties, and system limitations such as memory capacity and logic capacity. We introduce a fully pipelined hardware implementation of Gradient Boosted Tree training and a design framework that enables a versatile hardware system description with high performance and ﬂexibility to realize highly parameterized machine learning models. Experimental results show that our FPGA implementation achieves a 11- to 33-times faster performance and more than 300-times higher power e ﬃ ciency than a state-of-the-art GPU accelerated software implementation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IPSJ Transactions on System LSI Design Methodology Engineering-Electrical and Electronic Engineering

CiteScore

1.20

自引率

0.00%

发文量