Workshop Proceedings of the 49th International Conference on Parallel Processing最新文献

Symmetric Tokens based Group Mutual Exclusion 基于群互斥的对称令牌

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI: 10.1145/3409390.3409395

A. Aravind

引用次数: 1

Fast Modeling of Network Contention in Batch Point-to-point Communications by Packet-level Simulation with Dynamic Time-stepping 批处理点对点通信中网络争用的动态时间步进分组级仿真快速建模

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI: 10.1145/3409390.3409398

Zhang Yang, Jintao Peng, Qingkai Liu

{"title":"Fast Modeling of Network Contention in Batch Point-to-point Communications by Packet-level Simulation with Dynamic Time-stepping","authors":"Zhang Yang, Jintao Peng, Qingkai Liu","doi":"10.1145/3409390.3409398","DOIUrl":"https://doi.org/10.1145/3409390.3409398","url":null,"abstract":"Network contention has long been one of the root causes of performance loss in large-scale parallel applications. With the increasing importance of performance modeling to both large-scale application optimization and application-system co-design, the conflict of speed and accuracy in contention modeling is becoming prominent. Cycle-accurate network simulators are often too slow for large scale applications, while point-to-point analytical models are not accurate enough to capture the contention effects. To model the network contention in batch point-to-point communications, we propose a unified contention model after the flow-fair end-to-end congestion control mechanism. The model uses packet-level simulations to be accurate, but can be approximated by a flow-level semi-analytical model when messages are large enough, thus is fast. Furthermore, we propose a dynamic time-stepping technique which significantly speeds up the packet-level simulation with only minor accuracy loss. Experiments with typical communication patterns and application traces show that our model accurately predicates the communication time with an average error of 9%(fixed time step) and the dynamic time-stepping technique improve the simulation performance by up to 131 folds with an average accuracy loss of 10.5% for real application traces.","PeriodicalId":350506,"journal":{"name":"Workshop Proceedings of the 49th International Conference on Parallel Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115218870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Communication-aware Job Scheduling using SLURM 使用SLURM的感知通信的作业调度

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI: 10.1145/3409390.3409410

P. Mishra, Tushar Agrawal, Preeti Malakar

引用次数: 2

Preference Aware Smart Hospital Selection System for Patients 患者偏好感知智能医院选择系统

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI: 10.1145/3409390.3409391

Md. Solaiman Chowdhury, Jenifar Rahman, Md. Mahfuzur Rahman

引用次数: 0

A GCC-based Compliance Checker for Single-translation-unit, Identifier-related MISRA-C Rules 针对单个翻译单元、标识符相关的MISRA-C规则的基于gcc的遵从性检查器

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI: 10.1145/3409390.3409396

Guan-Ren Wang, Peng-Sheng Chen

引用次数: 0

Assessing the Overhead of Offloading Compression Tasks 评估卸载压缩任务的开销

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI: 10.1145/3409390.3409405

L. Promberger, R. Schwemmer, H. Fröning

{"title":"Assessing the Overhead of Offloading Compression Tasks","authors":"L. Promberger, R. Schwemmer, H. Fröning","doi":"10.1145/3409390.3409405","DOIUrl":"https://doi.org/10.1145/3409390.3409405","url":null,"abstract":"Exploring compression is increasingly promising as trade-off between computations and data movement. There are two main reasons: First, the gap between processing speed and I/O continues to grow, and technology trends indicate a continuation of this. Second, performance is determined by energy efficiency, and the overall power consumption is dominated by the consumption of data movements. For these reasons there is already a plethora of related works on compression from various domains. Most recently, a couple of accelerators have been introduced to offload compression tasks from the main processor, for instance by AHA, Intel and Microsoft. Yet, one lacks the understanding of the overhead of compression when offloading tasks. In particular, such offloading is most beneficial for overlap with other tasks, if the associated overhead on the main processor is negligible. This work evaluates the integration costs compared to a solely software-based solution considering multiple compression algorithms. Among others, High Energy Physics data are used as a prime example of big data sources. The results imply that on average the zlib implementation on the accelerator achieves a comparable compression ratio to zlib level 2 on a CPU, while having up to 17 times the throughput and utilizing over 80 % less CPU resources. These results suggest that, given the right orchestration of compression and data movement tasks, the overhead of offloading compression is limited but present. Considering that compression is only a single task of a larger data processing pipeline, this overhead cannot be neglected.","PeriodicalId":350506,"journal":{"name":"Workshop Proceedings of the 49th International Conference on Parallel Processing","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122763264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Improving the Space-Time Efficiency of Matrix Multiplication Algorithms 提高矩阵乘法算法的空时效率

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI: 10.1145/3409390.3409404

Yuan Tang

引用次数: 0

Network and Load-Aware Resource Manager for MPI Programs MPI程序的网络和负载感知资源管理器

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI: 10.1145/3409390.3409406

Ashish Kumar Kumar, N. Jain, Preeti Malakar

引用次数: 0

BSRNG: A High Throughput Parallel BitSliced Approach for Random Number Generators BSRNG:一种用于随机数生成器的高吞吐量并行位切片方法

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI: 10.1145/3409390.3409402

Saleh Khalaj Monfared, Omid Hajihassani, M. Kiarostami, S. M. Zanjani, Dara Rahmati, S. Gorgin

{"title":"BSRNG: A High Throughput Parallel BitSliced Approach for Random Number Generators","authors":"Saleh Khalaj Monfared, Omid Hajihassani, M. Kiarostami, S. M. Zanjani, Dara Rahmati, S. Gorgin","doi":"10.1145/3409390.3409402","DOIUrl":"https://doi.org/10.1145/3409390.3409402","url":null,"abstract":"In this work, a high throughput method for generating high-quality Pseudo-Random Numbers using the bitslicing technique is proposed. In such a technique, instead of the conventional row-major data representation, column-major data representation is employed, which allows the bitslicing implementation to take full advantage of all the available datapath of the hardware platform. By employing this data representation as building blocks of algorithms, we showcase the capability and scalability of our proposed method in various PRNG methods in the category of block and stream ciphers. The LFSR-based (Linear Feedback Shift Register) nature of the PRNG in our implementation perfectly suits the GPU’s many-core structure due to its register oriented architecture. In the proposed SIMD vectorized GPU implementation, each GPU thread can generate several 32 pseudo-random bits in each LFSR clock cycle. We then compare our implementation with some of the most significant PRNGs that display a satisfactory performance throughput and randomness criteria. The proposed implementation successfully passes the NIST test for statistical randomness and bit-wise correlation criteria. For computer-based PRNG and the optical solutions in terms of performance and performance per cost, this technique is efficient while maintaining an acceptable randomness measure. Our highest performance among all of the implemented CPRNGs with the proposed method is achieved by the MICKEY 2.0 algorithm, which shows 40% improvement over state of the art NVIDIA’s proprietary high-performance PRNG, cuRAND library, achieving 2.72 Tb/s of throughput on the affordable NVIDIA GTX 2080 Ti.","PeriodicalId":350506,"journal":{"name":"Workshop Proceedings of the 49th International Conference on Parallel Processing","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127508806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Randomized Authentication using IBE for Opportunistic Networks 基于IBE的机会网络随机认证

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI: 10.1145/3409390.3409392

Kai Wang, Kazuya Sakai

引用次数: 2