GPGPU上的皮质架构

GPGPU-3 Pub Date : 2010-03-14 DOI:10.1145/1735688.1735693

Andrew Nere, Mikko H. Lipasti

{"title":"GPGPU上的皮质架构","authors":"Andrew Nere, Mikko H. Lipasti","doi":"10.1145/1735688.1735693","DOIUrl":null,"url":null,"abstract":"As the number of devices available per chip continues to increase, the computational potential of future computer architectures grows likewise. While this is a clear benefit for future computing devices, future chips will also likely suffer from more faulty devices and increased power consumption. It is also likely that these chips will be difficult to program if the current trend of adding more parallel cores continues to follow in the future. However, recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energy-efficient possibility.\n In this paper we describe a GPGPU extension to an intelligent model based on the mammalian neocortex. The GPGPU is a readily-available architecture that fits well with the parallel cortical architecture inspired by the basic building blocks of the human brain. Using NVIDIA's CUDA framework, we have achieved up to 273x speedup over our unoptimized C++ serial implementation. We also consider two inefficiencies inherent to our initial design: multiple kernel-launch overhead and poor utilization of GPGPU resources. We propose using a software work-queue structure to solve the former, and pipelining the cortical architecture during training phase for the latter. Additionally, from our success in extending our model to the GPU, we speculate the necessary hardware requirements for simulating the computational abilities of mammalian brains.","PeriodicalId":381071,"journal":{"name":"GPGPU-3","volume":"691 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Cortical architectures on a GPGPU\",\"authors\":\"Andrew Nere, Mikko H. Lipasti\",\"doi\":\"10.1145/1735688.1735693\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the number of devices available per chip continues to increase, the computational potential of future computer architectures grows likewise. While this is a clear benefit for future computing devices, future chips will also likely suffer from more faulty devices and increased power consumption. It is also likely that these chips will be difficult to program if the current trend of adding more parallel cores continues to follow in the future. However, recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energy-efficient possibility.\\n In this paper we describe a GPGPU extension to an intelligent model based on the mammalian neocortex. The GPGPU is a readily-available architecture that fits well with the parallel cortical architecture inspired by the basic building blocks of the human brain. Using NVIDIA's CUDA framework, we have achieved up to 273x speedup over our unoptimized C++ serial implementation. We also consider two inefficiencies inherent to our initial design: multiple kernel-launch overhead and poor utilization of GPGPU resources. We propose using a software work-queue structure to solve the former, and pipelining the cortical architecture during training phase for the latter. Additionally, from our success in extending our model to the GPU, we speculate the necessary hardware requirements for simulating the computational abilities of mammalian brains.\",\"PeriodicalId\":381071,\"journal\":{\"name\":\"GPGPU-3\",\"volume\":\"691 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GPGPU-3\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1735688.1735693\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GPGPU-3","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1735688.1735693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

随着每个芯片上可用设备的数量不断增加，未来计算机体系结构的计算潜力也会随之增长。虽然这对未来的计算设备来说是一个明显的好处，但未来的芯片也可能会受到更多故障设备和更高功耗的影响。如果目前增加更多并行核的趋势在未来继续下去，这些芯片也很可能难以编程。然而，神经科学的最新进展使模拟人类新皮层的并行计算设备成为一种可行的、有吸引力的、容错的、节能的可能性。本文描述了一种基于哺乳动物新皮层的智能模型的GPGPU扩展。GPGPU是一种易于使用的架构，它非常适合受人脑基本构建块启发的并行皮质架构。使用NVIDIA的CUDA框架，我们在未优化的c++串行实现中实现了高达273x的加速。我们还考虑了初始设计固有的两个低效率:多个内核启动开销和GPGPU资源的低利用率。我们建议使用软件工作队列结构来解决前者，并在训练阶段对皮质结构进行流水线化。此外，从我们成功地将我们的模型扩展到GPU，我们推测了模拟哺乳动物大脑计算能力的必要硬件要求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cortical architectures on a GPGPU

As the number of devices available per chip continues to increase, the computational potential of future computer architectures grows likewise. While this is a clear benefit for future computing devices, future chips will also likely suffer from more faulty devices and increased power consumption. It is also likely that these chips will be difficult to program if the current trend of adding more parallel cores continues to follow in the future. However, recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energy-efficient possibility. In this paper we describe a GPGPU extension to an intelligent model based on the mammalian neocortex. The GPGPU is a readily-available architecture that fits well with the parallel cortical architecture inspired by the basic building blocks of the human brain. Using NVIDIA's CUDA framework, we have achieved up to 273x speedup over our unoptimized C++ serial implementation. We also consider two inefficiencies inherent to our initial design: multiple kernel-launch overhead and poor utilization of GPGPU resources. We propose using a software work-queue structure to solve the former, and pipelining the cortical architecture during training phase for the latter. Additionally, from our success in extending our model to the GPU, we speculate the necessary hardware requirements for simulating the computational abilities of mammalian brains.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

GPGPU-3

自引率

0.00%

发文量