A highly efficient implementation of back propagation algorithm using matrix instruction set architecture

M. Soliman, S. Mohamed
{"title":"A highly efficient implementation of back propagation algorithm using matrix instruction set architecture","authors":"M. Soliman, S. Mohamed","doi":"10.5555/1315424.1315425","DOIUrl":null,"url":null,"abstract":"Back Propagation (BP) training algorithm has received intensive research efforts to exploit its parallelism in order to reduce the training time for complex problems. A modified version of BP based on matrix-matrix multiplication was proposed for parallel processing. This paper discusses the implementation of Matrix Back Propagation (MBP) using scalar, vector, and matrix instruction set architecture (ISA). Besides, it shows that the performance of the MBP is improved by switching form scalar to vector ISA and form vector to matrix ISA. On a practical application, speech recognition, the speedup of training a neural network using unrolling scalar over scalar ISA is 1.83. On eight parallel lanes, the speedup of using vector, unrolling vector, and matrix ISA are respectively 10.33, 11.88, and 15.36, where the maximum theoretical speedup is 16. Our results show that the use of matrix ISA gives a performance close to the optimal because of reusing the loaded data, decreasing the loop overhead, and overlapping the memory operations by arithmetic operations.","PeriodicalId":212567,"journal":{"name":"Neural Parallel Sci. Comput.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2007-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Parallel Sci. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/1315424.1315425","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Back Propagation (BP) training algorithm has received intensive research efforts to exploit its parallelism in order to reduce the training time for complex problems. A modified version of BP based on matrix-matrix multiplication was proposed for parallel processing. This paper discusses the implementation of Matrix Back Propagation (MBP) using scalar, vector, and matrix instruction set architecture (ISA). Besides, it shows that the performance of the MBP is improved by switching form scalar to vector ISA and form vector to matrix ISA. On a practical application, speech recognition, the speedup of training a neural network using unrolling scalar over scalar ISA is 1.83. On eight parallel lanes, the speedup of using vector, unrolling vector, and matrix ISA are respectively 10.33, 11.88, and 15.36, where the maximum theoretical speedup is 16. Our results show that the use of matrix ISA gives a performance close to the optimal because of reusing the loaded data, decreasing the loop overhead, and overlapping the memory operations by arithmetic operations.
基于矩阵指令集架构的反向传播算法的高效实现
为了减少复杂问题的训练时间,反向传播(BP)训练算法得到了广泛的研究。提出了一种基于矩阵-矩阵乘法的改进BP算法,用于并行处理。本文讨论了使用标量、矢量和矩阵指令集体系结构(ISA)实现矩阵反向传播(MBP)。此外,从标量转换为矢量ISA,从矢量转换为矩阵ISA,可以提高MBP的性能。在语音识别的实际应用中,使用展开标量训练神经网络在标量ISA上的加速为1.83。在8条平行车道上,使用矢量、展开矢量和矩阵ISA的加速分别为10.33、11.88和15.36,其中最大理论加速为16。我们的结果表明,由于重用加载的数据、减少循环开销以及通过算术运算重叠内存操作,使用矩阵ISA提供了接近最优的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信