{"title":"Guest Editorial: Special Issue on 2020 IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2020)","authors":"M. Reichenbach, M. Jung, A. Orailoglu","doi":"10.1007/s10766-022-00732-7","DOIUrl":"https://doi.org/10.1007/s10766-022-00732-7","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"187 - 188"},"PeriodicalIF":1.5,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46420874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Lal, Bogaraju Sharatchandra Varma, Ben Juurlink
{"title":"A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads","authors":"S. Lal, Bogaraju Sharatchandra Varma, Ben Juurlink","doi":"10.1007/s10766-022-00729-2","DOIUrl":"https://doi.org/10.1007/s10766-022-00729-2","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"189 - 216"},"PeriodicalIF":1.5,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41950646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark Sagi, Nguyen Anh Vu Doan, Nael Fasfous, Thomas Wild, A. Herkersdorf
{"title":"Fine-Grained Power Modeling of Multicore Processors Using FFNNs","authors":"Mark Sagi, Nguyen Anh Vu Doan, Nael Fasfous, Thomas Wild, A. Herkersdorf","doi":"10.1007/s10766-022-00730-9","DOIUrl":"https://doi.org/10.1007/s10766-022-00730-9","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"243 - 266"},"PeriodicalIF":1.5,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48987889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs*","authors":"Hemalatha Eedi, Sahith Karra, Sathya Peri, Neha Ranabothu, Rahul Utkoor","doi":"10.1007/s10766-022-00725-6","DOIUrl":"https://doi.org/10.1007/s10766-022-00725-6","url":null,"abstract":"<p>PageRank kernel is a standard benchmark addressing various graph processing and analytical problems. The PageRank algorithm serves as a standard for many graph analytics and a foundation for extracting graph features and predicting user ratings in recommendation systems. The PageRank algorithm is an iterative algorithm that continuously updates the ranks of pages until it converges to a value. However, implementing the PageRank algorithm on a shared memory architecture while taking advantage of fine-grained parallelism with large-scale graphs is hard to implement. The experimental study and analysis of the parallel PageRank metric on large graphs and shared memory architectures using different programming models have been studied extensively. This paper presents the asynchronous execution of the PageRank algorithm to leverage the computations on massive graphs, especially on shared memory architectures. We evaluate the performance of our proposed non-blocking algorithms for PageRank computation on real-world and synthetic datasets using POSIX Multithreaded Library on a 56 core Intel(R) Xeon processor. We observed that our asynchronous implementations achieve <span>(10times)</span> to <span>(30times)</span> speed-up with respect to sequential runs and <span>(5times)</span> to <span>(10times)</span> improvements over synchronous variants.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"8 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers
{"title":"AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators","authors":"Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers","doi":"10.1007/s10766-022-00728-3","DOIUrl":"https://doi.org/10.1007/s10766-022-00728-3","url":null,"abstract":"<p>In recent years the growing popularity of Convolutional Neural Network(CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerator (DLAs). The large market for DLAs and the huge amount of papers published on DLA design show that there is currently no one-size-fits-all solution. Depending on the given optimization goals such as power consumption or performance, there may be several optimal solutions for each scenario. A commonly used method for finding these solutions as early as possible in the design cycle, is the employment of analytical models which try to describe a design by simple yet insightful and sufficiently accurate formulas. The main contribution of this work is the generic Analytical Model for AI accelerators (AMAIX) for the estimation of CNN execution time on DLAs. It is based on the popular Roofline model. To show the validity of our approach, AMAIX was applied to the Nvidia Deep Learning Accelerator (NVDLA) as a case study using the AlexNet and LeNet CNNs as workloads. The resulting performance predictions were verified against an RTL emulation of the NVDLA using a Synopsys ZeBu Server-based hybrid prototype. By refining the model following a divide-and-conquer paradigm, AMAIX predicted the inference time of AlexNet and LeNet on the NVDLA with an accuracy 98%. Furthermore, this work shows how to use the obtained results for root-cause analysis and as a starting point for design space exploration.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"8 5","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
August Ernstsson, Nicolas Vandenbergen, J. Keller, C. Kessler
{"title":"A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems","authors":"August Ernstsson, Nicolas Vandenbergen, J. Keller, C. Kessler","doi":"10.1007/s10766-022-00726-5","DOIUrl":"https://doi.org/10.1007/s10766-022-00726-5","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"319 - 340"},"PeriodicalIF":1.5,"publicationDate":"2022-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47541954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lukas Steiner, Matthias Jung, Felipe S. Prado, Kirill Bykov, N. Wehn
{"title":"DRAMSys4.0: An Open-Source Simulation Framework for In-depth DRAM Analyses","authors":"Lukas Steiner, Matthias Jung, Felipe S. Prado, Kirill Bykov, N. Wehn","doi":"10.1007/s10766-022-00727-4","DOIUrl":"https://doi.org/10.1007/s10766-022-00727-4","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"217 - 242"},"PeriodicalIF":1.5,"publicationDate":"2022-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42363470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-Efficient Partial-Duplication Task Mapping Under Multiple DVFS Schemes","authors":"Minyu Cui, A. Kritikakou, L. Mo, E. Casseau","doi":"10.1007/s10766-022-00724-7","DOIUrl":"https://doi.org/10.1007/s10766-022-00724-7","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"267 - 294"},"PeriodicalIF":1.5,"publicationDate":"2022-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47224032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajesh Pandian Muniasamy, R. Nasre, N. Narayanaswamy
{"title":"Accelerating Computation of Steiner Trees on GPUs","authors":"Rajesh Pandian Muniasamy, R. Nasre, N. Narayanaswamy","doi":"10.1007/s10766-021-00723-0","DOIUrl":"https://doi.org/10.1007/s10766-021-00723-0","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"50 1","pages":"152 - 185"},"PeriodicalIF":1.5,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47236477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}