Wavefront parallelization of recurrent neural networks on multi-core architectures

Robin Kumar Sharma, Marc Casas
{"title":"Wavefront parallelization of recurrent neural networks on multi-core architectures","authors":"Robin Kumar Sharma, Marc Casas","doi":"10.1145/3392717.3392762","DOIUrl":null,"url":null,"abstract":"Recurrent neural networks (RNNs) are widely used for natural language processing, time-series prediction, or text analysis tasks. The internal structure of RNNs inference and training in terms of data or control dependencies across their fundamental numerical kernels complicate the exploitation of model parallelism, which is the reason why just data-parallelism has been traditionally applied to accelerate RNNs. This paper presents W-Par (Wavefront-Parallelization), a comprehensive approach for RNNs inference and training on CPUs that relies on applying model parallelism into RNNs models. We use fine-grained pipeline parallelism in terms of wavefront computations to accelerate multi-layer RNNs running on multi-core CPUs. Wavefront computations have been widely applied in many scientific computing domains like stencil kernels or dynamic programming. W-Par divides RNNs workloads across different parallel tasks by defining input and output dependencies for each RNN cell. Our experiments considering different RNNs models demonstrate that W-Par achieves up to 6.6X speed-up for RNN models inference and training in comparison to current state-of-the-art implementations on modern multi-core CPU architectures. Importantly, W-Par maximizes performance on a wide range of scenarios, including different core counts or memory hierarchy configurations, without requiring any change at the source code level.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"170 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th ACM International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3392717.3392762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Recurrent neural networks (RNNs) are widely used for natural language processing, time-series prediction, or text analysis tasks. The internal structure of RNNs inference and training in terms of data or control dependencies across their fundamental numerical kernels complicate the exploitation of model parallelism, which is the reason why just data-parallelism has been traditionally applied to accelerate RNNs. This paper presents W-Par (Wavefront-Parallelization), a comprehensive approach for RNNs inference and training on CPUs that relies on applying model parallelism into RNNs models. We use fine-grained pipeline parallelism in terms of wavefront computations to accelerate multi-layer RNNs running on multi-core CPUs. Wavefront computations have been widely applied in many scientific computing domains like stencil kernels or dynamic programming. W-Par divides RNNs workloads across different parallel tasks by defining input and output dependencies for each RNN cell. Our experiments considering different RNNs models demonstrate that W-Par achieves up to 6.6X speed-up for RNN models inference and training in comparison to current state-of-the-art implementations on modern multi-core CPU architectures. Importantly, W-Par maximizes performance on a wide range of scenarios, including different core counts or memory hierarchy configurations, without requiring any change at the source code level.
多核结构上循环神经网络的波前并行化
递归神经网络(RNNs)广泛用于自然语言处理、时间序列预测或文本分析任务。rnn的内部结构在数据或控制依赖关系方面进行推理和训练,这使得模型并行性的利用变得复杂,这就是为什么传统上只应用数据并行性来加速rnn的原因。本文提出了W-Par(波前并行化),这是一种在cpu上进行rnn推理和训练的综合方法,它依赖于将模型并行性应用于rnn模型。我们在波前计算方面使用细粒度管道并行来加速在多核cpu上运行的多层rnn。波前计算已广泛应用于许多科学计算领域,如模板核和动态规划。W-Par通过定义每个RNN单元的输入和输出依赖关系,将RNN工作负载划分为不同的并行任务。我们考虑不同RNN模型的实验表明,与现代多核CPU架构上当前最先进的实现相比,W-Par在RNN模型推理和训练方面实现了高达6.6倍的加速。重要的是,W-Par在各种情况下(包括不同的核心计数或内存层次结构配置)都能最大限度地提高性能,而不需要在源代码级别进行任何更改。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信