Proceedings of the 48th International Conference on Parallel Processing最新文献

筛选
英文 中文
The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations 多尺度流体模拟的通信-重叠混合分解并行算法
Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337882
Yi Liu, Xiao-Wei Guo, Chao Li, Canqun Yang, X. Gan, P. Zhang, Yi Wang, Ran Zhao, Sijiang Fan
{"title":"The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations","authors":"Yi Liu, Xiao-Wei Guo, Chao Li, Canqun Yang, X. Gan, P. Zhang, Yi Wang, Ran Zhao, Sijiang Fan","doi":"10.1145/3337821.3337882","DOIUrl":"https://doi.org/10.1145/3337821.3337882","url":null,"abstract":"The MCDPar (Parallel algorithm for multi-scale simulations based on Mesh and BCF Decomposition) algorithm significantly reduced the execution time and improved the parallel scalability for the multi-scale fluid simulations. However, the performance bottleneck still exists for extremely large-scale parallel simulations. In this paper, we designed a communication-overlapped hybrid decomposition parallel algorithm to improve the performance of the original MCDPar on large-scale clusters. Through non-blocking communication and code scheduling, the communication overhead between the master and slave groups have been overlapped with the computation of more microscopic configuration fields for the master process. Thus the parallel efficiency and scalability of the multi-scale solver could be improved on large-scale parallel simulations. In the test case with the number of configuration fields NBCF = 1000 and mesh cells Ncell = 64000, the communication percentage between the corresponding master and slave processes is reduced by 39.71%. In the test case with NBCF = 3000 and Ncell = 64000, the time cost of the fastest execution is reduced by 31.13% using the communication-overlapped algorithm, which offers a better parallel scaling on 256 cores compared to original 128 cores.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115159532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
When Power Oversubscription Meets Traffic Flood Attack: Re-Thinking Data Center Peak Load Management 当电力超额认购遭遇流量洪水攻击:对数据中心峰值负荷管理的重新思考
Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337856
Xiaofeng Hou, Mingyu Liang, Chao Li, Wenli Zheng, Quan Chen, M. Guo
{"title":"When Power Oversubscription Meets Traffic Flood Attack: Re-Thinking Data Center Peak Load Management","authors":"Xiaofeng Hou, Mingyu Liang, Chao Li, Wenli Zheng, Quan Chen, M. Guo","doi":"10.1145/3337821.3337856","DOIUrl":"https://doi.org/10.1145/3337821.3337856","url":null,"abstract":"The state-of-the-art techniques on data center peak power management are too optimistic; they overestimate their benefits in a potentially insecure operating environment. Especially in data centers that oversubscribe power infrastructure, it is likely that unexpected traffics can violate power budget before an effective network DoS attack is observed. In this work, we take the first to investigate the joint effect of power throttling and traffic flooding. We characterize a special operating region in which DoS attacks can provoke undesirable power peaks without exhibiting network traffic anomalies. In this region, an attacker can trigger power emergency by sending normal traffics throughout the Internet. We term this new type of threat as DOPE (Denial of Power and Energy). We show that existing technologies are insufficient for eliminating DOPE without negative performance effects on legitimate users. To enhance data center resiliency, we propose a request-aware power management framework called Anti-DOPE. The key feature of Anti-DOPE is bridging the gap between network traffic controlling and server power management. Specifically, it pre-processes of incoming requests to isolate malicious power attacks on the network load balancer side and then post-processes of compute node performance to minimize the collateral damage it may cause. Anti-DOPE is orthogonal to prior power management schemes and requires minute system modification. Using Alibaba container trace we show that Anti-DOPE allows 44% shorter average response time. It also improves the 90th percentile tail latency by 68.1% compared to the other power controlling methods.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128073840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DICER DICER
Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337891
K. Nikas, Nikela Papadopoulou, Dimitra Giantsidi, Vasileios Karakostas, G. Goumas, N. Koziris
{"title":"DICER","authors":"K. Nikas, Nikela Papadopoulou, Dimitra Giantsidi, Vasileios Karakostas, G. Goumas, N. Koziris","doi":"10.1145/3337821.3337891","DOIUrl":"https://doi.org/10.1145/3337821.3337891","url":null,"abstract":"Workload consolidation has been shown to achieve improved resource utilisation in modern datacentres. In this paper we focus on the extended problem of allocating resources when co-locating High-Priority (HP) and Best-Effort (BE) applications. Current approaches either neglect this prioritisation and focus on maximising the utilisation of the server or favour HP execution resulting to severe performance degradation for BEs. We propose DICER, a novel, practical, dynamic cache partitioning scheme that adapts the LLC allocation to the needs of the HP and assigns spare cache resources to the BEs. Our evaluation reveals that DICER successfully increases the system's utilisation, while at the same time minimising the impact of co-location on HP's performance.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114613526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
RFPL RFPL
Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337887
Gaoxiang Xu, Dan Feng, Zhipeng Tan, Xinyan Zhang, Jie Xu, Xing Shu, Yifeng Zhu
{"title":"RFPL","authors":"Gaoxiang Xu, Dan Feng, Zhipeng Tan, Xinyan Zhang, Jie Xu, Xing Shu, Yifeng Zhu","doi":"10.1145/3337821.3337887","DOIUrl":"https://doi.org/10.1145/3337821.3337887","url":null,"abstract":"Parity based RAID suffers from poor small write performance due to heavy parity update overhead. The recently proposed method EPLOG constructs a new stripe with updated data chunks without updating old parity chunks. However, due to skewness of data accesses, old versions of updated data chunks often need to be kept to protect other data chunks of the same stripe. This seriously hurts the efficiency of recovering system from device failures due to the need of reconstructing the preserved old data chunks on failed devices. In this paper, we propose a Recovery Friendly Parity Logging scheme, called RFPL, which minimizes small write penalty and provides high recovery performance for SSD RAID. The key idea of RFPL is to reduce the mixture of old and new data chunks in a stripe by exploiting skewness of data accesses. RFPL constructs a new stripe with updated data chunks of the same old stripe. Since cold data chunks of the old stripe are rarely updated, it is likely that all of data chunks written to the new stripe are hot data and become old together within a short time span. This co-old of data chunks in a stripe effectively mitigates the total number of old data chunks which need to be preserved. We have implemented RFPL on a RAID-5 SSD array in Linux 4.3. Experimental results show that, compared with the Linux software RAID, RFPL reduces user I/O response time by 83.1% for normal state and 81.6% for reconstruction state. Compared with the state-of-the-art scheme EPLOG, RFPL reduces user I/O response time by 46.8% for normal state and 40.9% for reconstruction state. Our reliability analysis shows RFPL improves the mean time to data loss (MTTDL) by 9.36X and 1.44X compared with the Linux software RAID and EPLOG.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"679 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116107120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Runtime Adaptive Task Inlining on Asynchronous Multitasking Runtime Systems 异步多任务运行时系统的运行时自适应任务内联
Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337915
Bibek Wagle, Mohammad Alaul Haque Monil, K. Huck, A. Malony, Adrian Serio, Hartmut Kaiser
{"title":"Runtime Adaptive Task Inlining on Asynchronous Multitasking Runtime Systems","authors":"Bibek Wagle, Mohammad Alaul Haque Monil, K. Huck, A. Malony, Adrian Serio, Hartmut Kaiser","doi":"10.1145/3337821.3337915","DOIUrl":"https://doi.org/10.1145/3337821.3337915","url":null,"abstract":"As the era of high frequency, single core processors have come to a close, the new paradigm of many core processors has come to dominate. In response to these systems, asynchronous multitasking runtime systems have been developed as a promising solution to efficiently utilize these newly available hardware. Asynchronous multitasking runtime systems work by dividing a problem into a large number of fine grained tasks. However, as the number of tasks created increase, the overheads associated with task creation and management cannot be ignored. Task inlining, a method where the parent thread consumes a child thread, enables the runtime system to achieve the balance between parallelism and its overhead. As largely impacted by different processor architectures, the decision of task inlining is dynamic in nature. In this research, we present adaptive techniques for deciding, at runtime, whether a particular task should be inlined or not. We present two policies, a baseline policy that makes inlining decision based on a fixed threshold and an adaptive policy which decides the threshold dynamically at runtime. We also evaluate and justify the performance of these policies on different processor architectures. To the best of our knowledge, this is the first study of the impacts of adaptive policy at runtime for task inlining in an asynchronous multitasking runtime system on different processor architectures. From experimentation, we find that the baseline policy improves the execution time from 7.61% to 54.09%. Furthermore, the adaptive policy improves over the baseline policy by up to 74%.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115020416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
On Integration of Appends and Merges in Log-Structured Merge Trees 日志结构合并树中追加和归并的集成
Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337836
Caixin Gong, Shuibing He, Yili Gong, Yingchun Lei
{"title":"On Integration of Appends and Merges in Log-Structured Merge Trees","authors":"Caixin Gong, Shuibing He, Yili Gong, Yingchun Lei","doi":"10.1145/3337821.3337836","DOIUrl":"https://doi.org/10.1145/3337821.3337836","url":null,"abstract":"As widely used indices in key-value stores, the Log-Structured Merge-tree (LSM-tree) and its variants suffer from severe write amplification due to frequent merges in compactions for write-intensive applications. To address the problem, we first propose the Log-Structured Append-tree (LSA-tree), which tries to compact data with appends instead of merges, significantly reduces the write amplification and solves the issues existed in current append trees. However LSA increases read and space amplifications. Furthermore based on LSA, we design the Integrated Append/Merge-tree (IAM-tree). IAM selects appends or merges in compaction operations according to the size of memory-cached data. Theoretical analysis shows that IAM reduces the write amplification of LSM while keep the same read and space amplification. We implement IAM as a user library named IamDB. Experiments show that its write amplification is much less than that of LSM, only 8.71 vs. 19.00 for 1TB data with 64GB memory. Compared with nicely tuned LevelDB and RocksDB, IamDB provides 1.4-2.7× and 1.6-1.9× better write throughput, saves 12% and 10% disk space respectively, as well as the comparable read and scan performance. At the meantime IamDB achieves the most stable tail latency.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115022164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
HOPE 希望
Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337899
M. Yasugi, Daisuke Muraoka, Tasuku Hiraishi, Seiji Umatani, Kento Emoto
{"title":"HOPE","authors":"M. Yasugi, Daisuke Muraoka, Tasuku Hiraishi, Seiji Umatani, Kento Emoto","doi":"10.1145/3337821.3337899","DOIUrl":"https://doi.org/10.1145/3337821.3337899","url":null,"abstract":"This paper presents a new approach to fault-tolerant language systems without a single point of failure for irregular parallel applications. Work-stealing frameworks provide good load balancing for many parallel applications, including irregular ones written in a divide-and-conquer style. However, work-stealing frameworks with fault-tolerant features such as checkpointing do not always work well. This paper proposes a completely opposite \"work omission\" paradigm and its more detailed concept as a \"hierarchical omission\"-based parallel execution model called HOPE. HOPE programmers' task is to specify which regions in imperative code can be executed in sequential but arbitrary order and how their partial results can be accessed. HOPE workers spawn no tasks/threads at all; rather, every worker has the entire work of the program with its own planned execution order, and then the workers and the underlying message mediation systems automatically exchange partial results to omit hierarchical subcomputations. Even with fault tolerance, the HOPE framework provides parallel speedups for many parallel applications, including irregular ones.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123126380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
PhSIH
Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337859
Zhengyu Liao, Shiyou Qian, Jian Cao, Yanhua Cao, Guangtao Xue, Jiadi Yu, Yanmin Zhu, Minglu Li
{"title":"PhSIH","authors":"Zhengyu Liao, Shiyou Qian, Jian Cao, Yanhua Cao, Guangtao Xue, Jiadi Yu, Yanmin Zhu, Minglu Li","doi":"10.1145/3337821.3337859","DOIUrl":"https://doi.org/10.1145/3337821.3337859","url":null,"abstract":"The matching algorithm is a critical component of the content-based publish/subscribe system, whose performance has direct effects on the QoS of the whole system. Aiming to improve and stabilize the matching performance, we propose a lightweight parallelization method called PhSIH on the basis of three existing algorithms. PhSIH fulfills Parallelization by horizontally Segmenting the Indexing Hierarchy of data structures to support multiple threads performing matching tasks in parallel on a common data structure. PhSIH can adaptively adjust the degree of parallelism according to the changing workloads in order to meet the performance requirement. The main work of PhSIH concerns dynamically adjusting the degree of parallelism and computing a task allocation solution for parallel threads. PhSIH is implemented in Apache Kafka to augment it as a content-based publish/subscribe system, which makes Kafka suitable for real-time fine-grained event dissemination scenarios, such as stock ticks. To evaluate the parallelization effect and adaptability of PhSIH, a series of experiments are conducted based on synthetic and real-world data. The experiment results demonstrate that PhSIH achieves a good parallelization effect on the three existing algorithms and possesses a desirable adaptability that stabilizes the performance of the matching algorithms.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126259015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
AVR
Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1016/b978-0-7506-5635-1.x5018-3
{"title":"AVR","authors":"","doi":"10.1016/b978-0-7506-5635-1.x5018-3","DOIUrl":"https://doi.org/10.1016/b978-0-7506-5635-1.x5018-3","url":null,"abstract":"","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125698812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating All-Edge Common Neighbor Counting on Three Processors 在三个处理器上加速全边缘共邻计数
Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI: 10.1145/3337821.3337917
Yulin Che, Zhuohang Lai, Shixuan Sun, Qiong Luo, Yue Wang
{"title":"Accelerating All-Edge Common Neighbor Counting on Three Processors","authors":"Yulin Che, Zhuohang Lai, Shixuan Sun, Qiong Luo, Yue Wang","doi":"10.1145/3337821.3337917","DOIUrl":"https://doi.org/10.1145/3337821.3337917","url":null,"abstract":"We propose to accelerate an important but time-consuming operation in online graph analytics, which is the counting of common neighbors for each pair of adjacent vertices (u,v), or edge (u,v), on three modern processors of different architectures. We study two representative algorithms for this problem: (1) a merge-based pivot-skip algorithm (MPS) that intersects the two sets of neighbor vertices of each edge (u,v) to obtain the count; and (2) a bitmap-based algorithm (BMP), which dynamically constructs a bitmap index on the neighbor set of each vertex u, and for each neighbor v of u, looks up v's neighbors in u's bitmap. We parallelize and optimize both algorithms on a multicore CPU, an Intel Xeon Phi Knights Landing processor (KNL), and an NVIDIA GPU. Our experiments show that (1) Both the CPU and the GPU favor BMP whereas MPS wins on the KNL; (2) Across all datasets, the best performer is either MPS on the KNL or BMP on the GPU; and (3) Our optimized algorithms can complete the operation within tens of seconds on billion-edge Twitter graphs, enabling online analytics.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125709068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信