2012 SC Companion: High Performance Computing, Networking Storage and Analysis最新文献_第5页

Incremental and Parallel Analytics on Astrophysical Data Streams 天体物理数据流的增量和并行分析

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.130

D. Mishin, T. Budavári, A. Szalay, Yanif Ahmad

引用次数: 2

Designing a Collaborative Filtering Recommender on the Single Chip Cloud Computer 在单片云计算机上设计协同过滤推荐器

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.118

Aalap Tripathy, Atish Patra, S. Mohan, R. Mahapatra

{"title":"Designing a Collaborative Filtering Recommender on the Single Chip Cloud Computer","authors":"Aalap Tripathy, Atish Patra, S. Mohan, R. Mahapatra","doi":"10.1109/SC.Companion.2012.118","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.118","url":null,"abstract":"Fast response requirements for big-data applications on cloud infrastructures continues to grow. At the same time, many cores on-chip have now become a reality. These developments are set to redefine infrastructure nodes of cloud data centers in the future. For this to happen, parallel programming runtimes need to be designed for many-cores on chip as the target architecture. In this paper, we show that the commonly used MapReduce programming paradigm can be adapted to run on Intel's experimental single chip cloud computer (SCC) with 48-cores on chip. We demonstrate this using a Collaborative Filtering (CF) recommender system as an application. This is a widely used technique for information filtering to predict user's preference towards an unknown item from their past ratings. These systems are typically deployed in distributed clusters and operate on large apriori datasets. We address scalability with data partitioning, combining and sorting algorithms, maximize data locality to minimize communication cost within the SCC cores. We demonstrate ~2x speedup, ~94% lower energy consumption for benchmark workloads as compared to a distributed cluster of single and multi-processor nodes.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"49 1","pages":"838-847"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82857851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SAN Optimization for High Performance Storage with RDMA Data Transfer 基于RDMA数据传输的高性能存储SAN优化

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.15

Jae-Woo Choi, Youngjin Yu, Hyeonsang Eom, H. Yeom, Dongin Shin

{"title":"SAN Optimization for High Performance Storage with RDMA Data Transfer","authors":"Jae-Woo Choi, Youngjin Yu, Hyeonsang Eom, H. Yeom, Dongin Shin","doi":"10.1109/SC.Companion.2012.15","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.15","url":null,"abstract":"Today's server environments consist of many machines constructing clusters for distributed computing system or storage area networks (SAN) for effectively processing or saving enormous data. In these kinds of server environments, backend-storages are usually the bottleneck of the overall system. But it is not enough to simply replace the devices with better ones to exploit their performance benefits. In other words, proper optimizations are needed to fully utilize their performance gains. In this work, we first applied a high performance device as a backend-storage to the existing SAN solution, and found that it could not utilize the low latency and high bandwidth of the device, especially in case of small sized random I/O pattern even though a high speed network was used. To address this problem, we propose a new design that contains three optimizations: 1) removing software overheads to lower I/O latency; 2) parallelism to utilize the high bandwidth of the device; 3) temporal merge mechanism to reduce network overhead. We implemented them as a prototype and found that our solution makes substantial performance improvements in terms of both the latency and bandwidth.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"14 1","pages":"24-29"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82899841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Abstract: Hybrid Breadth First Search Implementation for Hybrid-Core Computers 摘要:混合核计算机的混合广度优先搜索实现

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.184

Kevin R. Wadleigh, John Amelio, K. Collins, G. Edwards

引用次数: 4

The SDAV Software Frameworks for Visualization and Analysis on Next-Generation Multi-Core and Many-Core Architectures 面向下一代多核与多核架构的SDAV可视化与分析软件框架

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.36

Christopher M. Sewell, J. Meredith, K. Moreland, T. Peterka, David E. DeMarle, Li-Ta Lo, J. Ahrens, Robert Maynard, Berk Geveci

引用次数: 9

Poster: High Performance GPU Accelerated TSP Solver 海报:高性能GPU加速TSP求解器

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.225

K. Rocki, R. Suda

{"title":"Poster: High Performance GPU Accelerated TSP Solver","authors":"K. Rocki, R. Suda","doi":"10.1109/SC.Companion.2012.225","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.225","url":null,"abstract":"We are presenting a high performance GPU accelerated implementation of 2-opt local search algorithm for the Traveling Salesman Problem (TSP). GPU usage greatly decreases the time needed to optimize the route, however requires a complicated and well tuned implementation. With the increasing problem size, the time spent on comparing the graph edges grows significantly. We used instances from the TSPLIB library for for testing and our results show that by using our GPU algorithm, the time needed to perform a simple local search operation can be decreased approximately 5 to 45 times compared to parallel CPU code implementation using 6 cores. The code has been implemented in CUDA as well as in OpenCL and tested on NVIDIA and AMD devices. The experimental studies have shown that the optimization algorithm using the GPU local search converges from up to 300 times faster on average compared to the sequential CPU version, depending on the problem size. The main contributions of this work are the problem division scheme exploiting data locality which allows to solve arbitrarily big problem instances using GPU and the parallel implementation of the algorithm itself.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"10 1","pages":"1413-1414"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88065052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Application of High Performance Computing to Solvency and Profitability Calculations for Life Assurance Contracts 高性能计算在寿险合同偿付能力和盈利能力计算中的应用

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.140

Mark Tucker, J. M. Bull

{"title":"The Application of High Performance Computing to Solvency and Profitability Calculations for Life Assurance Contracts","authors":"Mark Tucker, J. M. Bull","doi":"10.1109/SC.Companion.2012.140","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.140","url":null,"abstract":"In the UK, pension providers are required by law to demonstrate solvency on a regular basis; the regulations governing how solvency is demonstrated are changing. Historically, it has been sufficient to report solvency using a single `best estimate' set of assumptions. The new regulations require a Monte Carlo approach to finding a worst-case scenario that requires computing power which is outside the systems currently available in the industry. This paper aims to show that the new regulations could be met by moving away from current actuarial valuation software packages and producing well-performing ab initio code, employing a variety of HPC techniques. Using a combination of algorithmic improvements, serial optimisations and multi-core parallelism, we demonstrate a performance improvement over commercial software of a factor of over 105. We show that this brings the Monte Carlo simulations within the bounds of practicality, and we suggest possibilities for further improvements, for example using clusters of GPUs. We also identify other possible use cases for high performance solvency and profitability calculations.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"56 1","pages":"1163-1170"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86829298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Poster: Acceleration of the BLAST Hydro Code on GPU 海报:在GPU上加速BLAST Hydro Code

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.172

Tingxing Dong, T. Kolev, R. Rieben, V. Dobrev

引用次数: 1

A Case for Optimistic Coordination in HPC Storage Systems 高性能计算存储系统的乐观协调问题

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.19

P. Carns, K. Harms, D. Kimpe, R. Ross, J. Wozniak, L. Ward, M. Curry, Ruth Klundt, Geoff Danielson, Cengiz Karakoyunlu, J. Chandy, Bradley Settlemeyer, W. Gropp

{"title":"A Case for Optimistic Coordination in HPC Storage Systems","authors":"P. Carns, K. Harms, D. Kimpe, R. Ross, J. Wozniak, L. Ward, M. Curry, Ruth Klundt, Geoff Danielson, Cengiz Karakoyunlu, J. Chandy, Bradley Settlemeyer, W. Gropp","doi":"10.1109/SC.Companion.2012.19","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.19","url":null,"abstract":"High-performance computing (HPC) storage systems rely on access coordination to ensure that concurrent updates do not produce incoherent results. HPC storage systems typically employ pessimistic distributed locking to provide this functionality in cases where applications cannot perform their own coordination. This approach, however, introduces significant performance overhead and complicates fault handling. In this work we evaluate the viability of optimistic conditional storage operations as an alternative to distributed locking in HPC storage systems. We investigate design strategies and compare the two approaches in a prototype object storage system using a parallel read/modify/write benchmark. Our prototype illustrates that conditional operations can be easily integrated into distributed object storage systems and can outperform standard coordination primitives for simple update workloads. Our experiments show that conditional updates can achieve over two orders of magnitude higher performance than pessimistic locking for some parallel read/modify/write workloads.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"34 1","pages":"48-53"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79485477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Communication avoiding algorithms 通信避免算法

2012 SC Companion: High Performance Computing, Networking Storage and Analysis Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.351

J. Demmel

引用次数: 14