SC20: International Conference for High Performance Computing, Networking, Storage and Analysis最新文献_第2页

A Hierarchical and Load-Aware Design for Large Message Neighborhood Collectives 大型消息邻居群的分层和负载感知设计

SC20: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2020-11-01 DOI: 10.1109/SC41405.2020.00038

S. M. Ghazimirsaeed, Qinghua Zhou, Amit Ruhela, Mohammadreza Bayatpour

引用次数: 3

SegAlign: A Scalable GPU-Based Whole Genome Aligner SegAlign:一个可扩展的基于gpu的全基因组比对器

SC20: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2020-11-01 DOI: 10.1109/SC41405.2020.00043

Sneha D. Goenka, Yatish Turakhia, B. Paten, M. Horowitz

引用次数: 11

Scalable yet Rigorous Floating-Point Error Analysis 可扩展但严格的浮点误差分析

SC20: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2020-11-01 DOI: 10.1109/SC41405.2020.00055

Arnab Das, Ian Briggs, G. Gopalakrishnan, S. Krishnamoorthy, P. Panchekha

引用次数: 22

Kraken: Memory-Efficient Continual Learning for Large-Scale Real-Time Recommendations Kraken:大规模实时推荐的高效内存持续学习

SC20: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2020-11-01 DOI: 10.1109/SC41405.2020.00025

Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, H. Ao, Wanhong Xu, J. Shu

{"title":"Kraken: Memory-Efficient Continual Learning for Large-Scale Real-Time Recommendations","authors":"Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, H. Ao, Wanhong Xu, J. Shu","doi":"10.1109/SC41405.2020.00025","DOIUrl":"https://doi.org/10.1109/SC41405.2020.00025","url":null,"abstract":"Modern recommendation systems in industry often use deep learning (DL) models that achieve better model accuracy with more data and model parameters. However, current opensource DL frameworks, such as TensorFlow and PyTorch, show relatively low scalability on training recommendation models with terabytes of parameters. To efficiently learn large-scale recommendation models from data streams that generate hundreds of terabytes training data daily, we introduce a continual learning system called Kraken. Kraken contains a special parameter server implementation that dynamically adapts to the rapidly changing set of sparse features for the continual training and serving of recommendation models. Kraken provides a sparsity-aware training system that uses different learning optimizers for dense and sparse parameters to reduce memory overhead. Extensive experiments using real-world datasets confirm the effectiveness and scalability of Kraken. Kraken can benefit the accuracy of recommendation tasks with the same memory resources, or trisect the memory usage while keeping model performance.","PeriodicalId":424429,"journal":{"name":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129846628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Distributed-Memory Parallel Symmetric Nonnegative Matrix Factorization 分布式存储并行对称非负矩阵分解

SC20: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2020-11-01 DOI: 10.1109/SC41405.2020.00078

Srinivas Eswar, Koby Hayashi, Grey Ballard, R. Kannan, R. Vuduc, Haesun Park

引用次数: 5

PLINER: Isolating Lines of Floating-Point Code for Compiler-Induced Variability PLINER:为编译器引起的可变性隔离浮点代码行

SC20: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2020-11-01 DOI: 10.1109/SC41405.2020.00053

Hui Guo, I. Laguna, Cindy Rubio-González

引用次数: 9

RLScheduler: An Automated HPC Batch Job Scheduler Using Reinforcement Learning RLScheduler:一个使用强化学习的自动化HPC批处理作业调度程序

SC20: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2020-11-01 DOI: 10.1109/SC41405.2020.00035

Di Zhang, Dong Dai, Youbiao He, F. S. Bao, Bing Xie

{"title":"RLScheduler: An Automated HPC Batch Job Scheduler Using Reinforcement Learning","authors":"Di Zhang, Dong Dai, Youbiao He, F. S. Bao, Bing Xie","doi":"10.1109/SC41405.2020.00035","DOIUrl":"https://doi.org/10.1109/SC41405.2020.00035","url":null,"abstract":"Today’s high-performance computing (HPC) platforms are still dominated by batch jobs. Accordingly, effective batch job scheduling is crucial to obtain high system efficiency. Existing HPC batch job schedulers typically leverage heuristic priority functions to prioritize and schedule jobs. But, once configured and deployed by the experts, such priority functions can hardly adapt to the changes of job loads, optimization goals, or system settings, potentially leading to degraded system efficiency when changes occur. To address this fundamental issue, we present RLScheduler, an automated HPC batch job scheduler built on reinforcement learning. RLScheduler relies on minimal manual interventions or expert knowledge, but can learn high-quality scheduling policies via its own continuous ‘trial and error’. We introduce a new kernel-based neural network structure and trajectory filtering mechanism in RLScheduler to improve and stabilize the learning process. Through extensive evaluations, we confirm that RLScheduler can learn high-quality scheduling policies towards various workloads and various optimization goals with relatively low computation cost. Moreover, we show that the learned models perform stably even when applied to unseen workloads, making them practical for production use.","PeriodicalId":424429,"journal":{"name":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124430479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Scalable Heterogeneous Execution of a Coupled-Cluster Model with Perturbative Triples 具有摄动三元组的耦合簇模型的可伸缩异构执行

SC20: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2020-11-01 DOI: 10.1109/SC41405.2020.00083

Jinsung Kim, Ajay Panyala, B. Peng, K. Kowalski, P. Sadayappan, S. Krishnamoorthy

引用次数: 4

GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training GEMS: gpu支持的分布式DNN训练的内存感知模型并行系统

SC20: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2020-11-01 DOI: 10.1109/SC41405.2020.00049

Arpan Jain, A. Awan, Asmaa Aljuhani, J. Hashmi, Quentin G. Anthony, H. Subramoni, D. Panda, R. Machiraju, A. Parwani

引用次数: 28

Foresight: Analysis That Matters for Data Reduction 前瞻:对数据缩减至关重要的分析

SC20: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2020-11-01 DOI: 10.1109/SC41405.2020.00087

Pascal Grosset, C. Biwer, Jesus Pulido, A. Mohan, Ayan Biswas, J. Patchett, Terece L. Turton, D. Rogers, D. Livescu, J. Ahrens

{"title":"Foresight: Analysis That Matters for Data Reduction","authors":"Pascal Grosset, C. Biwer, Jesus Pulido, A. Mohan, Ayan Biswas, J. Patchett, Terece L. Turton, D. Rogers, D. Livescu, J. Ahrens","doi":"10.1109/SC41405.2020.00087","DOIUrl":"https://doi.org/10.1109/SC41405.2020.00087","url":null,"abstract":"As the computation power of supercomputers increases, so does simulation size, which in turn produces orders-of-magnitude more data. Because generated data often exceed the simulation’s disk quota, many simulations would stand to benefit from data-reduction techniques to reduce storage requirements. Such techniques include autoencoders, data compression algorithms, and sampling. Lossy compression techniques can significantly reduce data size, but such techniques come at the expense of losing information that could result in incorrect post hoc analysis results. To help scientists determine the best compression they can get while keeping their analyses accurate, we have developed Foresight, an analysis framework that enables users to evaluate how different data-reduction techniques will impact their analyses. We use particle data from a cosmology simulation, turbulence data from Direct Numerical Simulation, and asteroid impact data from xRage to demonstrate how Foresight can help scientists determine the best data-reduction technique for their simulations.","PeriodicalId":424429,"journal":{"name":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130418725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19