Replanting Your Forest: NVM-friendly Bagging Strategy for Random Forest

2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA) Pub Date : 2019-08-01 DOI:10.1109/NVMSA.2019.8863525

Y. Ho, Chun-Feng Wu, Ming-Chang Yang, Tseng-Yi Chen, Yuan-Hao Chang

{"title":"Replanting Your Forest: NVM-friendly Bagging Strategy for Random Forest","authors":"Y. Ho, Chun-Feng Wu, Ming-Chang Yang, Tseng-Yi Chen, Yuan-Hao Chang","doi":"10.1109/NVMSA.2019.8863525","DOIUrl":null,"url":null,"abstract":"Random forest is effective and accurate in making predictions for classification and regression problems, which constitute the majority of machine learning applications or systems nowadays. However, as the data are being generated explosively in this big data era, many machine learning algorithms, including the random forest algorithm, may face the difficulty in maintaining and processing all the required data in the main memory. Instead, intensive data movements (i.e., data swappings) between the faster-but-smaller main memory and the slowerbut-larger secondary storage may occur excessively and largely degrade the performance. To address this challenge, the emerging non-volatile memory (NVM) technologies are placed great hopes to substitute the traditional random access memory (RAM) and to build a larger-than-ever main memory space because of its higher cell density, lower power consumption, and comparable read performance as traditional RAM. Nevertheless, the limited write endurance of NVM cells and the read-write asymmetry of NVMs may still limit the feasibility of performing machine learning algorithms directly on NVMs. Such dilemma inspires this study to develop an NVM-friendly bagging strategy for the random forest algorithm, in order to trade the “randomness” of the sampled data for the reduced data movements in the memory hierarchy without hurting the prediction accuracy. The evaluation results show that the proposed design could save up to 72% of the write accesses on the representative traces with nearly no degradation on the prediction accuracy.","PeriodicalId":438544,"journal":{"name":"2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NVMSA.2019.8863525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Random forest is effective and accurate in making predictions for classification and regression problems, which constitute the majority of machine learning applications or systems nowadays. However, as the data are being generated explosively in this big data era, many machine learning algorithms, including the random forest algorithm, may face the difficulty in maintaining and processing all the required data in the main memory. Instead, intensive data movements (i.e., data swappings) between the faster-but-smaller main memory and the slowerbut-larger secondary storage may occur excessively and largely degrade the performance. To address this challenge, the emerging non-volatile memory (NVM) technologies are placed great hopes to substitute the traditional random access memory (RAM) and to build a larger-than-ever main memory space because of its higher cell density, lower power consumption, and comparable read performance as traditional RAM. Nevertheless, the limited write endurance of NVM cells and the read-write asymmetry of NVMs may still limit the feasibility of performing machine learning algorithms directly on NVMs. Such dilemma inspires this study to develop an NVM-friendly bagging strategy for the random forest algorithm, in order to trade the “randomness” of the sampled data for the reduced data movements in the memory hierarchy without hurting the prediction accuracy. The evaluation results show that the proposed design could save up to 72% of the write accesses on the representative traces with nearly no degradation on the prediction accuracy.

查看原文本刊更多论文

重新种植你的森林:随机森林的nvm友好套袋策略

随机森林在对分类和回归问题进行预测方面是有效和准确的，这是当今大多数机器学习应用或系统的组成部分。然而，在这个大数据时代，随着数据的爆炸式产生，包括随机森林算法在内的许多机器学习算法可能会面临在主存中维护和处理所有所需数据的困难。相反，在更快但更小的主存储器和更慢但更大的辅助存储器之间进行密集的数据移动(即数据交换)可能会过度发生，并在很大程度上降低性能。为了应对这一挑战，新兴的非易失性存储器(NVM)技术被寄予很大的希望，以取代传统的随机存取存储器(RAM)，并构建比以往更大的主存储器空间，因为它具有更高的单元密度、更低的功耗和与传统RAM相当的读取性能。然而，NVM单元有限的写入耐力和NVM的读写不对称可能仍然限制了直接在NVM上执行机器学习算法的可行性。这种困境激发了本研究为随机森林算法开发一种nvm友好的装袋策略，以便在不损害预测精度的情况下，以采样数据的“随机性”换取内存层次中减少的数据移动。评估结果表明，所提出的设计在预测精度几乎没有下降的情况下，可以节省代表性迹路上高达72%的写访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA)

自引率

0.00%

发文量