2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)最新文献

筛选
英文 中文
Exploring the Binary Precision Capabilities of Tensor Cores for Epistasis Detection 探索张量核在上位检测中的二元精度能力
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI: 10.1109/IPDPS47924.2020.00043
Ricardo Nobre, A. Ilic, Sergio Santander-Jiménez, L. Sousa
{"title":"Exploring the Binary Precision Capabilities of Tensor Cores for Epistasis Detection","authors":"Ricardo Nobre, A. Ilic, Sergio Santander-Jiménez, L. Sousa","doi":"10.1109/IPDPS47924.2020.00043","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00043","url":null,"abstract":"Genome-wide association studies are performed to correlate a number of diseases and other physical or even psychological conditions (phenotype) with substitutions of nucleotides at specific positions in the human genome, mainly single-nucleotide polymorphisms (SNPs). Some conditions, possibly because of the complexity of the mechanisms that give rise to them, have been identified to be more statistically correlated with genotype when multiple SNPs are jointly taken into account. However, the discovery of new associations between genotype and phenotype is exponentially slowed down by the increase of computational power required when epistasis, i.e., interactions between SNPs, is considered. This paper proposes a novel graphics processing unit (GPU)-based approach for epistasis detection that combines the use of modern tensor cores with native support for processing binarized inputs with algorithmic and target-focused optimizations. Using only a single mid-range Turing-based GPU, the proposed approach is able to evaluate 64.8×1012 and 25.4×1012 sets of SNPs per second, normalized to the number of patients, when considering 2-way and 3-way epistasis detection, respectively. This proposal is able to surpass the state-of-the-art approach by 6× and 8.2× in terms of the number of pairs and triplets of SNP allelic patient data evaluated per unit of time per GPU.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"338-347"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90905125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Packet-in Request Redirection for Minimizing Control Plane Response Time 最小控制平面响应时间的入包请求重定向
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI: 10.1109/IPDPS47924.2020.00099
Rui Xia, Haipeng Dai, Jiaqi Zheng, Hong Xu, M. Li, Guihai Chen
{"title":"Packet-in Request Redirection for Minimizing Control Plane Response Time","authors":"Rui Xia, Haipeng Dai, Jiaqi Zheng, Hong Xu, M. Li, Guihai Chen","doi":"10.1109/IPDPS47924.2020.00099","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00099","url":null,"abstract":"A distributed control plane is more scalable and robust in software defined networking. This paper focuses on controller load balancing using packet-in request redirection, that is, given the instantaneous state of the system, determining whether to redirect packet-in requests for each switch, such that the overall control plane response time (CPRT) is minimized. To address the above problem, we propose a framework based on Lyapunov optimization. First, we use the drift-plus-penalty algorithm to combine CPRT minimization problem with controller capacity constraints, and further derive a non-linear program, whose optimal solution is obtained with brute force using standard linearization techniques. Second, we present a greedy strategy to efficiently obtain a solution with a bounded approximation ratio. Third, we reformulate the program as a problem of maximizing a non-monotone submodular function subject to matroid constraints. We implement a controller proto-type for packet-in request redirection, and conduct trace-driven simulations to validate our theoretical results. The results show that our algorithms can reduce the average CPRT by 81.6% compared to static controller-switch assignment, and achieve a 3× improvement in maximum controller capacity violation ratio.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"87 1","pages":"926-935"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80830675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Understanding and Improving Persistent Transactions on Optane™ DC Memory 理解和改进Optane™DC内存上的持久事务
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI: 10.1109/IPDPS47924.2020.00044
P. Zardoshti, Michael F. Spear, A. Vosoughi, G. Swart
{"title":"Understanding and Improving Persistent Transactions on Optane™ DC Memory","authors":"P. Zardoshti, Michael F. Spear, A. Vosoughi, G. Swart","doi":"10.1109/IPDPS47924.2020.00044","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00044","url":null,"abstract":"Storing data structures in high-capacity byte-addressable persistent memory instead of DRAM or a storage device offers the opportunity to (1) reduce cost and power consumption compared with DRAM, (2) decrease the latency and CPU resources needed for an I/O operation compared with storage, and (3) allow for fast recovery as the data structure remains in memory after a machine failure. The first commercial offering in this space is Intel® Optane™ Direct Connect (Optane™ DC) Persistent Memory. Optane™ DC promises access time within a constant factor of DRAM, with larger capacity, lower energy consumption, and persistence. We present an experimental evaluation of persistent transactional memory performance, and explore how Optane™ DC durability domains affect the overall results. Given that neither of the two available durability domains can deliver performance competitive with DRAM, we introduce and emulate a new durability domain, called PDRAM, in which the memory controller tracks enough information (and has enough reserve power) to make DRAM behave like a persistent cache of Optane™ DC memory.In this paper we compare the performance of these durability domains on several configurations of five persistent transactional memory applications. We find a large throughput difference, which emphasizes the importance of choosing the best durability domain for each application and system. At the same time, our results confirm that recently published persistent transactional memory algorithms are able to scale, and that recent optimizations for these algorithms lead to strong performance, with speedups as high as 6× at 16 threads.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"29 1","pages":"348-357"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91166476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Efficient I/O for Neural Network Training with Compressed Data 基于压缩数据的神经网络高效I/O训练
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI: 10.1109/IPDPS47924.2020.00050
Zhao Zhang, Lei Huang, J. G. Pauloski, Ian T Foster
{"title":"Efficient I/O for Neural Network Training with Compressed Data","authors":"Zhao Zhang, Lei Huang, J. G. Pauloski, Ian T Foster","doi":"10.1109/IPDPS47924.2020.00050","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00050","url":null,"abstract":"FanStore is a shared object store that enables efficient and scalable neural network training on supercomputers. By providing a global cache layer on node-local burst buffers using a compressed representation, it significantly enhances the processing capability of deep learning (DL) applications on existing hardware. In addition, FanStore allows POSIX-compliant file access to the compressed data in user space. We investigate the tradeoff between runtime overhead and data compression ratio using real-world datasets and applications, and propose a compressor selection algorithm to maximize storage capacity given performance constraints. We consider both asynchronous (i.e., with prefetching) and synchronous I/O strategies, and propose mechanisms for selecting compressors for both approaches. Using FanStore, the same storage hardware can host 2–13× more data for example applications without significant runtime overhead. Empirically, our experiments show that FanStore scales to 512 compute nodes with near linear performance scalability.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"87 1","pages":"409-418"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85998311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
FP4S: Fragment-based Parallel State Recovery for Stateful Stream Applications FP4S:基于片段的并行状态恢复,用于有状态流应用程序
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI: 10.1109/IPDPS47924.2020.00116
Pinchao Liu, Hailu Xu, D. D. Silva, Qingyang Wang, Sarker Tanzir Ahmed, Liting Hu
{"title":"FP4S: Fragment-based Parallel State Recovery for Stateful Stream Applications","authors":"Pinchao Liu, Hailu Xu, D. D. Silva, Qingyang Wang, Sarker Tanzir Ahmed, Liting Hu","doi":"10.1109/IPDPS47924.2020.00116","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00116","url":null,"abstract":"Streaming computations are by nature long-running. They run in highly dynamic distributed environments where many stream operators may leave or fail at the same time. Most of them are stateful, in which stream operators need to store and maintain large-sized state in memory, resulting in expensive time and space costs to recover them. The state-of-the-art stream processing systems offer failure recovery mainly through three approaches: replication recovery, checkpointing recovery, and DStream-based lineage recovery, which are either slow, resource-expensive or fail to handle many simultaneous failures.We present FP4S, a novel fragment-based parallel state recovery mechanism that can handle many simultaneous failures for a large number of concurrently running stream applications. The novelty of FP4S is that we organize all the application’s operators into a distributed hash table (DHT) based consistent ring to associate each operator with a unique set of neighbors. Then we divide each operator’s in-memory state into many fragments and periodically save them in each node’s neighbors, ensuring that different sets of available fragments can reconstruct lost state in parallel. This approach makes this failure recovery mechanism extremely scalable, and allows it to tolerate many simultaneous operator failures. We apply FP4S on Apache Storm and evaluate it using large-scale real-world experiments, which demonstrate its scalability, efficiency, and fast failure recovery features. When compared to the state-of-the-art solutions (Apache Storm), FP4S reduces 37.8% latency of state recovery and saves more than half of the hardware costs. It can scale to many simultaneous failures and successfully recover the states when up to 66.6% of states fail or get lost.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"51 1","pages":"1102-1111"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87290235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting 基于gpu的静态数据流分析用于快速和可扩展的Android应用审查
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI: 10.1109/IPDPS47924.2020.00037
Xiaodong Yu, Fengguo Wei, Xinming Ou, M. Becchi, Tekin Bicer, D. Yao
{"title":"GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting","authors":"Xiaodong Yu, Fengguo Wei, Xinming Ou, M. Becchi, Tekin Bicer, D. Yao","doi":"10.1109/IPDPS47924.2020.00037","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00037","url":null,"abstract":"Many popular vetting tools for Android applications use static code analysis techniques. In particular, Interprocedural Data-Flow Graph (IDFG) construction is the computation at the core of Android static data-flow analysis and consumes most of the analysis time. Many analysis tools use a worklist algorithm, an iterative fixed-point approach, to construct the IDFG. In this paper, we observe that a straightforward GPU parallelization of the worklist algorithm leads to significant underutilization of the GPU resources. We identify four performance bottlenecks, namely, frequent dynamic memory allocations, high branch divergence, workload imbalance, and irregular memory access patterns. Accordingly, we propose GDroid, a GPU-based worklist algorithm implementation with multiple fine-grained optimizations tailored to common characteristics of Android applications. The optimizations considered are: matrix-based data structure, memory access-based node grouping, and worklist merging. Our experimental evaluation, performed on 1000 Android applications, shows that the proposed optimizations are beneficial to performance, and GDroid can achieve up to 128X speedups against a plain GPU implementation.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"131 1","pages":"274-284"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79629826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
IPDPS 2020 Index IPDPS 2020指数
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI: 10.1109/ipdps47924.2020.00121
{"title":"IPDPS 2020 Index","authors":"","doi":"10.1109/ipdps47924.2020.00121","DOIUrl":"https://doi.org/10.1109/ipdps47924.2020.00121","url":null,"abstract":"","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"106 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81185082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IPDPS 2020 Breaker Page IPDPS 2020断路器页面
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI: 10.1109/ipdps47924.2020.00003
{"title":"IPDPS 2020 Breaker Page","authors":"","doi":"10.1109/ipdps47924.2020.00003","DOIUrl":"https://doi.org/10.1109/ipdps47924.2020.00003","url":null,"abstract":"","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86242618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IPDPS 2020 Commentary IPDPS 2020评论
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI: 10.1109/ipdps47924.2020.00001
{"title":"IPDPS 2020 Commentary","authors":"","doi":"10.1109/ipdps47924.2020.00001","DOIUrl":"https://doi.org/10.1109/ipdps47924.2020.00001","url":null,"abstract":"","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78275421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Active Learning Method for Empirical Modeling in Performance Tuning 性能调优中经验建模的主动学习方法
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI: 10.1109/IPDPS47924.2020.00034
Jiepeng Zhang, Jingwei Sun, Wenju Zhou, Guangzhong Sun
{"title":"An Active Learning Method for Empirical Modeling in Performance Tuning","authors":"Jiepeng Zhang, Jingwei Sun, Wenju Zhou, Guangzhong Sun","doi":"10.1109/IPDPS47924.2020.00034","DOIUrl":"https://doi.org/10.1109/IPDPS47924.2020.00034","url":null,"abstract":"Tuning performance of scientific applications is a challenging problem since performance can be a complicated nonlinear function with respect to application parameters. Empirical performance modeling is a useful approach to approximate the function and enable efficient heuristic methods to find sub-optimal parameter configurations. However, empirical performance modeling requires a large number of samples from the parameter space, which is resource and time-consuming. To address this issue, existing work based on active learning techniques proposed PBU Sampling method considering performance before uncertainty, which iteratively performs performance biased sampling to model the high-performance subspace instead of the entire space before evaluating the most uncertain samples to reduce redundancy. Compared with uniformly random sampling, this approach can reduce the number of samples, but it still involves redundant sampling that potentially can be improved.We propose a novel active learning based method to exploit the information of evaluated samples and explore possible high-performance parameter configurations. Specifically, we adopt a Performance Weighted Uncertainty (PWU) sampling strategy to identify the configurations with either high performance or high uncertainty and determine which ones are selected for evaluation. To evaluate the effectiveness of our proposed method, we construct random forest to predict the execution time of kernels from SPAPT suite and two typical scientific parallel applications kripke, hypre. Experimental results show that compared with existing methods, our proposed method can reduce the cost of modeling by up to 21x and 3x on average meanwhile hold the same prediction accuracy.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"5 1","pages":"244-253"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78079407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信