XCelHD:具有并行训练的高效gpu驱动的超维计算

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2022-01-17 DOI:10.1109/ASP-DAC52403.2022.9712549

Jaeyoung Kang, Behnam Khaleghi, Yeseong Kim, T. Simunic

{"title":"XCelHD:具有并行训练的高效gpu驱动的超维计算","authors":"Jaeyoung Kang, Behnam Khaleghi, Yeseong Kim, T. Simunic","doi":"10.1109/ASP-DAC52403.2022.9712549","DOIUrl":null,"url":null,"abstract":"Hyperdimensional Computing (HDC) is an emerging lightweight machine learning method alternative to deep learning. One of its key strengths is the ability to accelerate it in hardware, as it offers massive parallelisms. Prior work primarily focused on FPGA and ASIC, which do not provide the seamless flexibility required for HDC applications. Few studies that attempted GPU designs are inefficient, partly due to the complexity of accelerating HDC on GPUs because of the bit-level operations of HDC. Besides, HDC training exhibited low hardware utilization due to sequential operations. In this paper, we present XCelHD, a high-performance GPU-powered framework for HDC. XCelHD uses a novel training method to maximize the training speed of the HDC model while fully utilizing hardware. We propose memory optimization strategies specialized for GPU-based HDC, minimizing the access time to different memory subsystems and redundant operations. We show that the proposed training method reduces the required number of training epochs by four-fold to achieve comparable accuracy. Our evaluation results on NVIDIA Jetson TX2 show that XCelHD is up to $35\\times$ faster than the state-of-the-art TensorFlow-based HDC implementation.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"XCelHD: An Efficient GPU-Powered Hyperdimensional Computing with Parallelized Training\",\"authors\":\"Jaeyoung Kang, Behnam Khaleghi, Yeseong Kim, T. Simunic\",\"doi\":\"10.1109/ASP-DAC52403.2022.9712549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hyperdimensional Computing (HDC) is an emerging lightweight machine learning method alternative to deep learning. One of its key strengths is the ability to accelerate it in hardware, as it offers massive parallelisms. Prior work primarily focused on FPGA and ASIC, which do not provide the seamless flexibility required for HDC applications. Few studies that attempted GPU designs are inefficient, partly due to the complexity of accelerating HDC on GPUs because of the bit-level operations of HDC. Besides, HDC training exhibited low hardware utilization due to sequential operations. In this paper, we present XCelHD, a high-performance GPU-powered framework for HDC. XCelHD uses a novel training method to maximize the training speed of the HDC model while fully utilizing hardware. We propose memory optimization strategies specialized for GPU-based HDC, minimizing the access time to different memory subsystems and redundant operations. We show that the proposed training method reduces the required number of training epochs by four-fold to achieve comparable accuracy. Our evaluation results on NVIDIA Jetson TX2 show that XCelHD is up to $35\\\\times$ faster than the state-of-the-art TensorFlow-based HDC implementation.\",\"PeriodicalId\":239260,\"journal\":{\"name\":\"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASP-DAC52403.2022.9712549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC52403.2022.9712549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

HDC (Hyperdimensional Computing)是一种新兴的轻量级机器学习方法，可以替代深度学习。它的主要优势之一是能够在硬件上加速，因为它提供了大量的并行性。之前的工作主要集中在FPGA和ASIC上，它们不能提供HDC应用所需的无缝灵活性。很少有研究尝试GPU设计是低效的，部分原因是由于HDC的位级操作导致在GPU上加速HDC的复杂性。此外，HDC训练由于顺序操作，硬件利用率较低。在本文中，我们提出了XCelHD，一个高性能的gpu驱动的HDC框架。XCelHD采用了一种新颖的训练方法，在充分利用硬件的同时最大限度地提高了HDC模型的训练速度。我们提出了专门针对基于gpu的HDC的内存优化策略，最大限度地减少了对不同内存子系统的访问时间和冗余操作。我们表明，所提出的训练方法将所需的训练次数减少了四倍，以达到相当的精度。我们在NVIDIA Jetson TX2上的评估结果表明，XCelHD比最先进的基于tensorflow的HDC实现快35倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

XCelHD: An Efficient GPU-Powered Hyperdimensional Computing with Parallelized Training

Hyperdimensional Computing (HDC) is an emerging lightweight machine learning method alternative to deep learning. One of its key strengths is the ability to accelerate it in hardware, as it offers massive parallelisms. Prior work primarily focused on FPGA and ASIC, which do not provide the seamless flexibility required for HDC applications. Few studies that attempted GPU designs are inefficient, partly due to the complexity of accelerating HDC on GPUs because of the bit-level operations of HDC. Besides, HDC training exhibited low hardware utilization due to sequential operations. In this paper, we present XCelHD, a high-performance GPU-powered framework for HDC. XCelHD uses a novel training method to maximize the training speed of the HDC model while fully utilizing hardware. We propose memory optimization strategies specialized for GPU-based HDC, minimizing the access time to different memory subsystems and redundant operations. We show that the proposed training method reduces the required number of training epochs by four-fold to achieve comparable accuracy. Our evaluation results on NVIDIA Jetson TX2 show that XCelHD is up to $35\times$ faster than the state-of-the-art TensorFlow-based HDC implementation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)

自引率

0.00%

发文量