SC14: International Conference for High Performance Computing, Networking, Storage and Analysis最新文献_第7页

Practical Symbolic Race Checking of GPU Programs 实用的GPU程序符号竞争检查

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.20

Peng Li, Guodong Li, G. Gopalakrishnan

引用次数: 30

Using an Adaptive HPC Runtime System to Reconfigure the Cache Hierarchy 使用自适应HPC运行时系统重新配置缓存层次结构

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.90

E. Totoni, J. Torrellas, L. Kalé

引用次数: 13

Understanding Soft Error Resiliency of Blue Gene/Q Compute Chip through Hardware Proton Irradiation and Software Fault Injection 通过硬件质子照射和软件故障注入了解蓝色基因/Q计算芯片的软错误弹性

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.53

Chen-Yong Cher, M. Gupta, P. Bose, K. Muller

引用次数: 33

A System Software Approach to Proactive Memory-Error Avoidance 主动避免内存错误的系统软件方法

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.63

Carlos H. A. Costa, Yoonho Park, Bryan S. Rosenburg, Chen-Yong Cher, K. D. Ryu

引用次数: 29

Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems 部署和操作大规模以数据为中心的并行文件系统的最佳实践和经验教训

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.23

S. Oral, James Simmons, Jason Hill, Dustin Leverman, Feiyi Wang, M. Ezell, Ross G. Miller, Douglas Fuller, Raghul Gunasekaran, Youngjae Kim, Saurabh Gupta, Devesh Tiwari, Sudharshan S. Vazhkudai, James H. Rogers, D. Dillow, G. Shipman, Arthur S. Bland

引用次数: 41

Scalable Kernel Fusion for Memory-Bound GPU Applications 内存绑定GPU应用的可扩展内核融合

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.21

M. Wahib, N. Maruyama

引用次数: 73

A Communication-Optimal Framework for Contracting Distributed Tensors 收缩分布张量的通信最优框架

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.36

Samyam Rajbhandari, Akshay Nikam, Pai-Wei Lai, Kevin Stock, S. Krishnamoorthy, P. Sadayappan

引用次数: 27

Dissecting On-Node Memory Access Performance: A Semantic Approach 剖析节点上内存访问性能:一种语义方法

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.19

Alfredo Giménez, T. Gamblin, B. Rountree, A. Bhatele, Ilir Jusufi, P. Bremer, B. Hamann

{"title":"Dissecting On-Node Memory Access Performance: A Semantic Approach","authors":"Alfredo Giménez, T. Gamblin, B. Rountree, A. Bhatele, Ilir Jusufi, P. Bremer, B. Hamann","doi":"10.1109/SC.2014.19","DOIUrl":"https://doi.org/10.1109/SC.2014.19","url":null,"abstract":"Optimizing memory access is critical for performance and power efficiency. CPU manufacturers have developed sampling-based performance measurement units (PMUs) that report precise costs of memory accesses at specific addresses. However, this data is too low-level to be meaningfully interpreted and contains an excessive amount of irrelevant or uninteresting information. We have developed a method to gather fine-grained memory access performance data for specific data objects and regions of code with low overhead and attribute semantic information to the sampled memory accesses. This information provides the context necessary to more effectively interpret the data. We have developed a tool that performs this sampling and attribution and used the tool to discover and diagnose performance problems in real-world applications. Our techniques provide useful insight into the memory behaviour of applications and allow programmers to understand the performance ramifications of key design decisions: domain decomposition, multi-threading, and data motion within distributed memory systems.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127978456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

A Volume Integral Equation Stokes Solver for Problems with Variable Coefficients 变系数问题的体积积分方程Stokes求解器

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.13

D. Malhotra, A. Gholami, G. Biros

{"title":"A Volume Integral Equation Stokes Solver for Problems with Variable Coefficients","authors":"D. Malhotra, A. Gholami, G. Biros","doi":"10.1109/SC.2014.13","DOIUrl":"https://doi.org/10.1109/SC.2014.13","url":null,"abstract":"We present a novel numerical scheme for solving the Stokes equation with variable coefficients in the unit box. Our scheme is based on a volume integral equation formulation. Compared to finite element methods, our formulation decouples the velocity and pressure, generates velocity fields that are by construction divergence free to high accuracy and its performance does not depend on the order of the basis used for discretization. In addition, we employ a novel adaptive fast multipole method for volume integrals to obtain a scheme that is algorithmically optimal. Our scheme supports non-uniform discretizations and is spectrally accurate. To increase per node performance, we have integrated our code with both NVIDIA and Intel accelerators. In our largest scalability test, we solved a problem with 20 billion unknowns, using a 14-order approximation for the velocity, on 2048 nodes of the Stampede system at the Texas Advanced Computing Center. We achieved 0.656 peta FLOPS for the overall code (23% efficiency) and one peta FLOPS for the volume integrals (33% efficiency). As an application example, we simulate Stokes ow in a porous medium with highly complex pore structure using a penalty formulation to enforce the no slip condition.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128001724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

FlexSlot: Moving Hadoop Into the Cloud with Flexible Slot Management FlexSlot:通过灵活的插槽管理将Hadoop迁移到云端

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.83

Yanfei Guo, J. Rao, Changjun Jiang, Xiaobo Zhou

引用次数: 25