International Journal of Parallel Programming最新文献_第5页

The Celerity High-level API: C++20 for Accelerator Clusters 加速高级API: c++ 20加速器集群

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2022-04-22 DOI: 10.1007/s10766-022-00731-8

Peter Thoman, Florian Tischler, Philip Salzmann, T. Fahringer

引用次数: 5

Guest Editorial: Special Issue on 2020 IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2020) 客座编辑：2020 IEEE嵌入式计算机系统国际会议特刊：架构、建模和仿真（SAMOS 2020）

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2022-04-01 DOI: 10.1007/s10766-022-00732-7

M. Reichenbach, M. Jung, A. Orailoglu

引用次数: 1

A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads 内存分散工作负载下GPU缓存局部性的定量研究

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2022-04-01 DOI: 10.1007/s10766-022-00729-2

S. Lal, Bogaraju Sharatchandra Varma, Ben Juurlink

引用次数: 2

Fine-Grained Power Modeling of Multicore Processors Using FFNNs 基于FFNN的多核处理器细粒度功率建模

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2022-03-29 DOI: 10.1007/s10766-022-00730-9

Mark Sagi, Nguyen Anh Vu Doan, Nael Fasfous, Thomas Wild, A. Herkersdorf

引用次数: 3

An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs* 一种改进/优化的实用非阻塞PageRank算法*

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2022-03-26 DOI: 10.1007/s10766-022-00725-6

Hemalatha Eedi, Sahith Karra, Sathya Peri, Neha Ranabothu, Rahul Utkoor

{"title":"An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs*","authors":"Hemalatha Eedi, Sahith Karra, Sathya Peri, Neha Ranabothu, Rahul Utkoor","doi":"10.1007/s10766-022-00725-6","DOIUrl":"https://doi.org/10.1007/s10766-022-00725-6","url":null,"abstract":"PageRank kernel is a standard benchmark addressing various graph processing and analytical problems. The PageRank algorithm serves as a standard for many graph analytics and a foundation for extracting graph features and predicting user ratings in recommendation systems. The PageRank algorithm is an iterative algorithm that continuously updates the ranks of pages until it converges to a value. However, implementing the PageRank algorithm on a shared memory architecture while taking advantage of fine-grained parallelism with large-scale graphs is hard to implement. The experimental study and analysis of the parallel PageRank metric on large graphs and shared memory architectures using different programming models have been studied extensively. This paper presents the asynchronous execution of the PageRank algorithm to leverage the computations on massive graphs, especially on shared memory architectures. We evaluate the performance of our proposed non-blocking algorithms for PageRank computation on real-world and synthetic datasets using POSIX Multithreaded Library on a 56 core Intel(R) Xeon processor. We observed that our asynchronous implementations achieve (10times) to (30times) speed-up with respect to sequential runs and (5times) to (10times) improvements over synchronous variants.","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"8 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2022-03-24 DOI: 10.1007/s10766-022-00728-3

Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers

{"title":"AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators","authors":"Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers","doi":"10.1007/s10766-022-00728-3","DOIUrl":"https://doi.org/10.1007/s10766-022-00728-3","url":null,"abstract":"In recent years the growing popularity of Convolutional Neural Network(CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerator (DLAs). The large market for DLAs and the huge amount of papers published on DLA design show that there is currently no one-size-fits-all solution. Depending on the given optimization goals such as power consumption or performance, there may be several optimal solutions for each scenario. A commonly used method for finding these solutions as early as possible in the design cycle, is the employment of analytical models which try to describe a design by simple yet insightful and sufficiently accurate formulas. The main contribution of this work is the generic Analytical Model for AI accelerators (AMAIX) for the estimation of CNN execution time on DLAs. It is based on the popular Roofline model. To show the validity of our approach, AMAIX was applied to the Nvidia Deep Learning Accelerator (NVDLA) as a case study using the AlexNet and LeNet CNNs as workloads. The resulting performance predictions were verified against an RTL emulation of the NVDLA using a Synopsys ZeBu Server-based hybrid prototype. By refining the model following a divide-and-conquer paradigm, AMAIX predicted the inference time of AlexNet and LeNet on the NVDLA with an accuracy 98%. Furthermore, this work shows how to use the obtained results for root-cause analysis and as a starting point for design space exploration.","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"8 5","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems 用于异构并行系统基于模式编程的可移植确定性并行伪随机数生成器

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2022-03-22 DOI: 10.1007/s10766-022-00726-5

August Ernstsson, Nicolas Vandenbergen, J. Keller, C. Kessler

引用次数: 0

DRAMSys4.0: An Open-Source Simulation Framework for In-depth DRAM Analyses DRAMSys4.0：一个用于深入DRAM分析的开源仿真框架

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2022-03-12 DOI: 10.1007/s10766-022-00727-4

Lukas Steiner, Matthias Jung, Felipe S. Prado, Kirill Bykov, N. Wehn

引用次数: 2

Energy-Efficient Partial-Duplication Task Mapping Under Multiple DVFS Schemes 多DVFS方案下的高效部分重复任务映射

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2022-02-16 DOI: 10.1007/s10766-022-00724-7

Minyu Cui, A. Kritikakou, L. Mo, E. Casseau

引用次数: 3

Accelerating Computation of Steiner Trees on GPUs GPU上Steiner树的加速计算

IF 1.5 4区计算机科学

International Journal of Parallel Programming Pub Date : 2021-11-27 DOI: 10.1007/s10766-021-00723-0

Rajesh Pandian Muniasamy, R. Nasre, N. Narayanaswamy

引用次数: 3