International Journal of Parallel Programming最新文献

筛选
英文 中文
The Celerity High-level API: C++20 for Accelerator Clusters 加速高级API: c++ 20加速器集群
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2022-04-22 DOI: 10.1007/s10766-022-00731-8
Peter Thoman, Florian Tischler, Philip Salzmann, T. Fahringer
{"title":"The Celerity High-level API: C++20 for Accelerator Clusters","authors":"Peter Thoman, Florian Tischler, Philip Salzmann, T. Fahringer","doi":"10.1007/s10766-022-00731-8","DOIUrl":"https://doi.org/10.1007/s10766-022-00731-8","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"341 - 359"},"PeriodicalIF":1.5,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48507697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Guest Editorial: Special Issue on 2020 IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2020) 客座编辑:2020 IEEE嵌入式计算机系统国际会议特刊:架构、建模和仿真(SAMOS 2020)
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2022-04-01 DOI: 10.1007/s10766-022-00732-7
M. Reichenbach, M. Jung, A. Orailoglu
{"title":"Guest Editorial: Special Issue on 2020 IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2020)","authors":"M. Reichenbach, M. Jung, A. Orailoglu","doi":"10.1007/s10766-022-00732-7","DOIUrl":"https://doi.org/10.1007/s10766-022-00732-7","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"187 - 188"},"PeriodicalIF":1.5,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46420874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads 内存分散工作负载下GPU缓存局部性的定量研究
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2022-04-01 DOI: 10.1007/s10766-022-00729-2
S. Lal, Bogaraju Sharatchandra Varma, Ben Juurlink
{"title":"A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads","authors":"S. Lal, Bogaraju Sharatchandra Varma, Ben Juurlink","doi":"10.1007/s10766-022-00729-2","DOIUrl":"https://doi.org/10.1007/s10766-022-00729-2","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"189 - 216"},"PeriodicalIF":1.5,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41950646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fine-Grained Power Modeling of Multicore Processors Using FFNNs 基于FFNN的多核处理器细粒度功率建模
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2022-03-29 DOI: 10.1007/s10766-022-00730-9
Mark Sagi, Nguyen Anh Vu Doan, Nael Fasfous, Thomas Wild, A. Herkersdorf
{"title":"Fine-Grained Power Modeling of Multicore Processors Using FFNNs","authors":"Mark Sagi, Nguyen Anh Vu Doan, Nael Fasfous, Thomas Wild, A. Herkersdorf","doi":"10.1007/s10766-022-00730-9","DOIUrl":"https://doi.org/10.1007/s10766-022-00730-9","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"243 - 266"},"PeriodicalIF":1.5,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48987889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs* 一种改进/优化的实用非阻塞PageRank算法*
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2022-03-26 DOI: 10.1007/s10766-022-00725-6
Hemalatha Eedi, Sahith Karra, Sathya Peri, Neha Ranabothu, Rahul Utkoor
{"title":"An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs*","authors":"Hemalatha Eedi, Sahith Karra, Sathya Peri, Neha Ranabothu, Rahul Utkoor","doi":"10.1007/s10766-022-00725-6","DOIUrl":"https://doi.org/10.1007/s10766-022-00725-6","url":null,"abstract":"<p>PageRank kernel is a standard benchmark addressing various graph processing and analytical problems. The PageRank algorithm serves as a standard for many graph analytics and a foundation for extracting graph features and predicting user ratings in recommendation systems. The PageRank algorithm is an iterative algorithm that continuously updates the ranks of pages until it converges to a value. However, implementing the PageRank algorithm on a shared memory architecture while taking advantage of fine-grained parallelism with large-scale graphs is hard to implement. The experimental study and analysis of the parallel PageRank metric on large graphs and shared memory architectures using different programming models have been studied extensively. This paper presents the asynchronous execution of the PageRank algorithm to leverage the computations on massive graphs, especially on shared memory architectures. We evaluate the performance of our proposed non-blocking algorithms for PageRank computation on real-world and synthetic datasets using POSIX Multithreaded Library on a 56 core Intel(R) Xeon processor. We observed that our asynchronous implementations achieve <span>(10times)</span> to <span>(30times)</span> speed-up with respect to sequential runs and <span>(5times)</span> to <span>(10times)</span> improvements over synchronous variants.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"8 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2022-03-24 DOI: 10.1007/s10766-022-00728-3
Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers
{"title":"AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators","authors":"Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers","doi":"10.1007/s10766-022-00728-3","DOIUrl":"https://doi.org/10.1007/s10766-022-00728-3","url":null,"abstract":"<p>In recent years the growing popularity of Convolutional Neural Network(CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerator (DLAs). The large market for DLAs and the huge amount of papers published on DLA design show that there is currently no one-size-fits-all solution. Depending on the given optimization goals such as power consumption or performance, there may be several optimal solutions for each scenario. A commonly used method for finding these solutions as early as possible in the design cycle, is the employment of analytical models which try to describe a design by simple yet insightful and sufficiently accurate formulas. The main contribution of this work is the generic Analytical Model for AI accelerators (AMAIX) for the estimation of CNN execution time on DLAs. It is based on the popular Roofline model. To show the validity of our approach, AMAIX was applied to the Nvidia Deep Learning Accelerator (NVDLA) as a case study using the AlexNet and LeNet CNNs as workloads. The resulting performance predictions were verified against an RTL emulation of the NVDLA using a Synopsys ZeBu Server-based hybrid prototype. By refining the model following a divide-and-conquer paradigm, AMAIX predicted the inference time of AlexNet and LeNet on the NVDLA with an accuracy 98%. Furthermore, this work shows how to use the obtained results for root-cause analysis and as a starting point for design space exploration.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"8 5","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems 用于异构并行系统基于模式编程的可移植确定性并行伪随机数生成器
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2022-03-22 DOI: 10.1007/s10766-022-00726-5
August Ernstsson, Nicolas Vandenbergen, J. Keller, C. Kessler
{"title":"A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems","authors":"August Ernstsson, Nicolas Vandenbergen, J. Keller, C. Kessler","doi":"10.1007/s10766-022-00726-5","DOIUrl":"https://doi.org/10.1007/s10766-022-00726-5","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"319 - 340"},"PeriodicalIF":1.5,"publicationDate":"2022-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47541954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DRAMSys4.0: An Open-Source Simulation Framework for In-depth DRAM Analyses DRAMSys4.0:一个用于深入DRAM分析的开源仿真框架
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2022-03-12 DOI: 10.1007/s10766-022-00727-4
Lukas Steiner, Matthias Jung, Felipe S. Prado, Kirill Bykov, N. Wehn
{"title":"DRAMSys4.0: An Open-Source Simulation Framework for In-depth DRAM Analyses","authors":"Lukas Steiner, Matthias Jung, Felipe S. Prado, Kirill Bykov, N. Wehn","doi":"10.1007/s10766-022-00727-4","DOIUrl":"https://doi.org/10.1007/s10766-022-00727-4","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"217 - 242"},"PeriodicalIF":1.5,"publicationDate":"2022-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42363470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Energy-Efficient Partial-Duplication Task Mapping Under Multiple DVFS Schemes 多DVFS方案下的高效部分重复任务映射
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2022-02-16 DOI: 10.1007/s10766-022-00724-7
Minyu Cui, A. Kritikakou, L. Mo, E. Casseau
{"title":"Energy-Efficient Partial-Duplication Task Mapping Under Multiple DVFS Schemes","authors":"Minyu Cui, A. Kritikakou, L. Mo, E. Casseau","doi":"10.1007/s10766-022-00724-7","DOIUrl":"https://doi.org/10.1007/s10766-022-00724-7","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"267 - 294"},"PeriodicalIF":1.5,"publicationDate":"2022-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47224032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Accelerating Computation of Steiner Trees on GPUs GPU上Steiner树的加速计算
IF 1.5 4区 计算机科学
International Journal of Parallel Programming Pub Date : 2021-11-27 DOI: 10.1007/s10766-021-00723-0
Rajesh Pandian Muniasamy, R. Nasre, N. Narayanaswamy
{"title":"Accelerating Computation of Steiner Trees on GPUs","authors":"Rajesh Pandian Muniasamy, R. Nasre, N. Narayanaswamy","doi":"10.1007/s10766-021-00723-0","DOIUrl":"https://doi.org/10.1007/s10766-021-00723-0","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"152 - 185"},"PeriodicalIF":1.5,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47236477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信