Proceedings of the 48th International Conference on Parallel Processing最新文献_第2页

The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations 多尺度流体模拟的通信-重叠混合分解并行算法

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337882

Yi Liu, Xiao-Wei Guo, Chao Li, Canqun Yang, X. Gan, P. Zhang, Yi Wang, Ran Zhao, Sijiang Fan

{"title":"The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations","authors":"Yi Liu, Xiao-Wei Guo, Chao Li, Canqun Yang, X. Gan, P. Zhang, Yi Wang, Ran Zhao, Sijiang Fan","doi":"10.1145/3337821.3337882","DOIUrl":"https://doi.org/10.1145/3337821.3337882","url":null,"abstract":"The MCDPar (Parallel algorithm for multi-scale simulations based on Mesh and BCF Decomposition) algorithm significantly reduced the execution time and improved the parallel scalability for the multi-scale fluid simulations. However, the performance bottleneck still exists for extremely large-scale parallel simulations. In this paper, we designed a communication-overlapped hybrid decomposition parallel algorithm to improve the performance of the original MCDPar on large-scale clusters. Through non-blocking communication and code scheduling, the communication overhead between the master and slave groups have been overlapped with the computation of more microscopic configuration fields for the master process. Thus the parallel efficiency and scalability of the multi-scale solver could be improved on large-scale parallel simulations. In the test case with the number of configuration fields NBCF = 1000 and mesh cells Ncell = 64000, the communication percentage between the corresponding master and slave processes is reduced by 39.71%. In the test case with NBCF = 3000 and Ncell = 64000, the time cost of the fastest execution is reduced by 31.13% using the communication-overlapped algorithm, which offers a better parallel scaling on 256 cores compared to original 128 cores.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115159532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

When Power Oversubscription Meets Traffic Flood Attack: Re-Thinking Data Center Peak Load Management 当电力超额认购遭遇流量洪水攻击:对数据中心峰值负荷管理的重新思考

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337856

Xiaofeng Hou, Mingyu Liang, Chao Li, Wenli Zheng, Quan Chen, M. Guo

{"title":"When Power Oversubscription Meets Traffic Flood Attack: Re-Thinking Data Center Peak Load Management","authors":"Xiaofeng Hou, Mingyu Liang, Chao Li, Wenli Zheng, Quan Chen, M. Guo","doi":"10.1145/3337821.3337856","DOIUrl":"https://doi.org/10.1145/3337821.3337856","url":null,"abstract":"The state-of-the-art techniques on data center peak power management are too optimistic; they overestimate their benefits in a potentially insecure operating environment. Especially in data centers that oversubscribe power infrastructure, it is likely that unexpected traffics can violate power budget before an effective network DoS attack is observed. In this work, we take the first to investigate the joint effect of power throttling and traffic flooding. We characterize a special operating region in which DoS attacks can provoke undesirable power peaks without exhibiting network traffic anomalies. In this region, an attacker can trigger power emergency by sending normal traffics throughout the Internet. We term this new type of threat as DOPE (Denial of Power and Energy). We show that existing technologies are insufficient for eliminating DOPE without negative performance effects on legitimate users. To enhance data center resiliency, we propose a request-aware power management framework called Anti-DOPE. The key feature of Anti-DOPE is bridging the gap between network traffic controlling and server power management. Specifically, it pre-processes of incoming requests to isolate malicious power attacks on the network load balancer side and then post-processes of compute node performance to minimize the collateral damage it may cause. Anti-DOPE is orthogonal to prior power management schemes and requires minute system modification. Using Alibaba container trace we show that Anti-DOPE allows 44% shorter average response time. It also improves the 90th percentile tail latency by 68.1% compared to the other power controlling methods.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128073840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Accelerating All-Edge Common Neighbor Counting on Three Processors 在三个处理器上加速全边缘共邻计数

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337917

Yulin Che, Zhuohang Lai, Shixuan Sun, Qiong Luo, Yue Wang

引用次数: 4

AVR

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1016/b978-0-7506-5635-1.x5018-3

引用次数: 0

HOPE 希望

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337899

M. Yasugi, Daisuke Muraoka, Tasuku Hiraishi, Seiji Umatani, Kento Emoto

引用次数: 4

Runtime Adaptive Task Inlining on Asynchronous Multitasking Runtime Systems 异步多任务运行时系统的运行时自适应任务内联

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337915

Bibek Wagle, Mohammad Alaul Haque Monil, K. Huck, A. Malony, Adrian Serio, Hartmut Kaiser

{"title":"Runtime Adaptive Task Inlining on Asynchronous Multitasking Runtime Systems","authors":"Bibek Wagle, Mohammad Alaul Haque Monil, K. Huck, A. Malony, Adrian Serio, Hartmut Kaiser","doi":"10.1145/3337821.3337915","DOIUrl":"https://doi.org/10.1145/3337821.3337915","url":null,"abstract":"As the era of high frequency, single core processors have come to a close, the new paradigm of many core processors has come to dominate. In response to these systems, asynchronous multitasking runtime systems have been developed as a promising solution to efficiently utilize these newly available hardware. Asynchronous multitasking runtime systems work by dividing a problem into a large number of fine grained tasks. However, as the number of tasks created increase, the overheads associated with task creation and management cannot be ignored. Task inlining, a method where the parent thread consumes a child thread, enables the runtime system to achieve the balance between parallelism and its overhead. As largely impacted by different processor architectures, the decision of task inlining is dynamic in nature. In this research, we present adaptive techniques for deciding, at runtime, whether a particular task should be inlined or not. We present two policies, a baseline policy that makes inlining decision based on a fixed threshold and an adaptive policy which decides the threshold dynamically at runtime. We also evaluate and justify the performance of these policies on different processor architectures. To the best of our knowledge, this is the first study of the impacts of adaptive policy at runtime for task inlining in an asynchronous multitasking runtime system on different processor architectures. From experimentation, we find that the baseline policy improves the execution time from 7.61% to 54.09%. Furthermore, the adaptive policy improves over the baseline policy by up to 74%.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115020416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

On Integration of Appends and Merges in Log-Structured Merge Trees 日志结构合并树中追加和归并的集成

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337836

Caixin Gong, Shuibing He, Yili Gong, Yingchun Lei

{"title":"On Integration of Appends and Merges in Log-Structured Merge Trees","authors":"Caixin Gong, Shuibing He, Yili Gong, Yingchun Lei","doi":"10.1145/3337821.3337836","DOIUrl":"https://doi.org/10.1145/3337821.3337836","url":null,"abstract":"As widely used indices in key-value stores, the Log-Structured Merge-tree (LSM-tree) and its variants suffer from severe write amplification due to frequent merges in compactions for write-intensive applications. To address the problem, we first propose the Log-Structured Append-tree (LSA-tree), which tries to compact data with appends instead of merges, significantly reduces the write amplification and solves the issues existed in current append trees. However LSA increases read and space amplifications. Furthermore based on LSA, we design the Integrated Append/Merge-tree (IAM-tree). IAM selects appends or merges in compaction operations according to the size of memory-cached data. Theoretical analysis shows that IAM reduces the write amplification of LSM while keep the same read and space amplification. We implement IAM as a user library named IamDB. Experiments show that its write amplification is much less than that of LSM, only 8.71 vs. 19.00 for 1TB data with 64GB memory. Compared with nicely tuned LevelDB and RocksDB, IamDB provides 1.4-2.7× and 1.6-1.9× better write throughput, saves 12% and 10% disk space respectively, as well as the comparable read and scan performance. At the meantime IamDB achieves the most stable tail latency.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115022164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

AVR: Reducing Memory Traffic with Approximate Value Reconstruction AVR:减少内存流量与近似值重建

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337824

Albin Eldstål-Damlin, P. Trancoso, I. Sourdis

{"title":"AVR: Reducing Memory Traffic with Approximate Value Reconstruction","authors":"Albin Eldstål-Damlin, P. Trancoso, I. Sourdis","doi":"10.1145/3337821.3337824","DOIUrl":"https://doi.org/10.1145/3337821.3337824","url":null,"abstract":"This paper describes Approximate Value Reconstruction (AVR), an architecture for approximate memory compression. AVR reduces the memory traffic of applications that tolerate approximations in their dataset. Thereby, it utilizes more efficiently the available off-chip bandwidth improving significantly system performance and energy efficiency. AVR compresses memory blocks using low latency downsampling that exploits similarities between neighboring values and achieves aggressive compression ratios, up to 16:1 in our implementation. The proposed AVR architecture supports our compression scheme maximizing its effect and minimizing its overheads by (i) co-locating in the Last Level Cache (LLC) compressed and uncompressed data, (ii) efficiently handling LLC evictions, (iii) keeping track of badly compressed memory blocks, and (iv) avoiding LLC pollution with unwanted decompressed data. For applications that tolerate aggressive approximation in large fractions of their data, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing up to 1.2% error to the application output.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"30 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115929446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

RFPL RFPL

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337887

Gaoxiang Xu, Dan Feng, Zhipeng Tan, Xinyan Zhang, Jie Xu, Xing Shu, Yifeng Zhu

{"title":"RFPL","authors":"Gaoxiang Xu, Dan Feng, Zhipeng Tan, Xinyan Zhang, Jie Xu, Xing Shu, Yifeng Zhu","doi":"10.1145/3337821.3337887","DOIUrl":"https://doi.org/10.1145/3337821.3337887","url":null,"abstract":"Parity based RAID suffers from poor small write performance due to heavy parity update overhead. The recently proposed method EPLOG constructs a new stripe with updated data chunks without updating old parity chunks. However, due to skewness of data accesses, old versions of updated data chunks often need to be kept to protect other data chunks of the same stripe. This seriously hurts the efficiency of recovering system from device failures due to the need of reconstructing the preserved old data chunks on failed devices. In this paper, we propose a Recovery Friendly Parity Logging scheme, called RFPL, which minimizes small write penalty and provides high recovery performance for SSD RAID. The key idea of RFPL is to reduce the mixture of old and new data chunks in a stripe by exploiting skewness of data accesses. RFPL constructs a new stripe with updated data chunks of the same old stripe. Since cold data chunks of the old stripe are rarely updated, it is likely that all of data chunks written to the new stripe are hot data and become old together within a short time span. This co-old of data chunks in a stripe effectively mitigates the total number of old data chunks which need to be preserved. We have implemented RFPL on a RAID-5 SSD array in Linux 4.3. Experimental results show that, compared with the Linux software RAID, RFPL reduces user I/O response time by 83.1% for normal state and 81.6% for reconstruction state. Compared with the state-of-the-art scheme EPLOG, RFPL reduces user I/O response time by 46.8% for normal state and 40.9% for reconstruction state. Our reliability analysis shows RFPL improves the mean time to data loss (MTTDL) by 9.36X and 1.44X compared with the Linux software RAID and EPLOG.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"679 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116107120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

DICER DICER

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337891

K. Nikas, Nikela Papadopoulou, Dimitra Giantsidi, Vasileios Karakostas, G. Goumas, N. Koziris

引用次数: 14