Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning

2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2016-03-12 DOI:10.1109/HPCA.2016.7446049

M. N. Bojnordi, Engin Ipek

{"title":"Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning","authors":"M. N. Bojnordi, Engin Ipek","doi":"10.1109/HPCA.2016.7446049","DOIUrl":null,"url":null,"abstract":"The Boltzmann machine is a massively parallel computational model capable of solving a broad class of combinatorial optimization problems. In recent years, it has been successfully applied to training deep machine learning models on massive datasets. High performance implementations of the Boltzmann machine using GPUs, MPI-based HPC clusters, and FPGAs have been proposed in the literature. Regrettably, the required all-to-all communication among the processing units limits the performance of these efforts. This paper examines a new class of hardware accelerators for large-scale combinatorial optimization and deep learning based on memristive Boltzmann machines. A massively parallel, memory-centric hardware accelerator is proposed based on recently developed resistive RAM (RRAM) technology. The proposed accelerator exploits the electrical properties of RRAm to realize in situ, fine-grained parallel computation within memory arrays, thereby eliminating the need for exchanging data between the memory cells and the computational units. Two classical optimization problems, graph partitioning and boolean satisfiability, and a deep belief network application are mapped onto the proposed hardware. As compared to a multicore system, the proposed accelerator achieves 57x higher performance and 25x lower energy with virtually no loss in the quality of the solution to the optimization problems. The memristive accelerator is also compared against an RRAM based processing-in-memory (PIM) system, with respective performance and energy improvements of 6.89x and 5.2x.","PeriodicalId":417994,"journal":{"name":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"115 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"185","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2016.7446049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 185

Abstract

The Boltzmann machine is a massively parallel computational model capable of solving a broad class of combinatorial optimization problems. In recent years, it has been successfully applied to training deep machine learning models on massive datasets. High performance implementations of the Boltzmann machine using GPUs, MPI-based HPC clusters, and FPGAs have been proposed in the literature. Regrettably, the required all-to-all communication among the processing units limits the performance of these efforts. This paper examines a new class of hardware accelerators for large-scale combinatorial optimization and deep learning based on memristive Boltzmann machines. A massively parallel, memory-centric hardware accelerator is proposed based on recently developed resistive RAM (RRAM) technology. The proposed accelerator exploits the electrical properties of RRAm to realize in situ, fine-grained parallel computation within memory arrays, thereby eliminating the need for exchanging data between the memory cells and the computational units. Two classical optimization problems, graph partitioning and boolean satisfiability, and a deep belief network application are mapped onto the proposed hardware. As compared to a multicore system, the proposed accelerator achieves 57x higher performance and 25x lower energy with virtually no loss in the quality of the solution to the optimization problems. The memristive accelerator is also compared against an RRAM based processing-in-memory (PIM) system, with respective performance and energy improvements of 6.89x and 5.2x.

查看原文本刊更多论文

记忆玻尔兹曼机:用于组合优化和深度学习的硬件加速器

玻尔兹曼机是一种大规模并行计算模型，能够解决广泛的组合优化问题。近年来，它已经成功地应用于在海量数据集上训练深度机器学习模型。玻尔兹曼机的高性能实现使用gpu，基于mpi的高性能计算集群和fpga已经在文献中提出。遗憾的是，处理单元之间所需的全对全通信限制了这些努力的效果。本文研究了一类新的基于记忆玻尔兹曼机的大规模组合优化和深度学习硬件加速器。基于近年来发展起来的电阻式RAM技术，提出了一种大规模并行、以存储为中心的硬件加速器。所提出的加速器利用RRAm的电学特性来实现存储阵列内的原位、细粒度并行计算，从而消除了在存储单元和计算单元之间交换数据的需要。两个经典的优化问题，图划分和布尔可满足性，以及一个深度信念网络应用映射到所提出的硬件。与多核系统相比，所提出的加速器的性能提高了57倍，能量降低了25倍，而优化问题的解决质量几乎没有损失。忆阻加速器还与基于RRAM的内存处理(PIM)系统进行了比较，其性能和能量分别提高了6.89倍和5.2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量