Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis最新文献_第3页

Instruction-level simulation of a cluster at scale 大规模集群的指令级模拟

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654063

E. León, R. Riesen, A. Maccabe, P. Bridges

引用次数: 34

Adaptive and scalable metadata management to support a trillion files 自适应和可扩展的元数据管理，支持一万亿文件

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654086

Jing Xing, Jin Xiong, Ninghui Sun, Jie Ma

{"title":"Adaptive and scalable metadata management to support a trillion files","authors":"Jing Xing, Jin Xiong, Ninghui Sun, Jie Ma","doi":"10.1145/1654059.1654086","DOIUrl":"https://doi.org/10.1145/1654059.1654086","url":null,"abstract":"Nowadays more and more applications require file systems to efficiently maintain million or more files. How to provide high access performance with such a huge number of files and such large directories is a big challenge for cluster file systems. Limited by static directory structures, existing file systems will be prohibitively inefficient for this use. To address this problem, we present a scalable and adaptive metadata management system which aims to maintain a trillion files efficiently. Firstly, our system exploits an adaptive two-level directory partitioning based on extendible hashing to manage very large directories. Secondly, our system utilizes fine-grained parallel processing within a directory and greatly improves performance of file creation or deletion. Thirdly, our system uses multiple-layered metadata cache management which improves memory utilization on the servers. And finally, our system uses a dynamic loadbalance mechanism based on consistent hashing which enables our system to scale up and down easily. Our performance results on 32 metadata servers show that our user-level prototype implementation can create more than 74 thousand files per second and can get more than 270 thousand files' attributes per second in a single directory with 100 million files. Moreover, it delivers a peak throughput of more than 60 thousand file creates/second in a single directory with 1 billion files.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"365 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115906264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

A case for integrated processor-cache partitioning in chip multiprocessors 芯片多处理器中集成处理器缓存分区的一个案例

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654066

Shekhar Srikantaiah, R. Das, Asit K. Mishra, C. Das, M. Kandemir

{"title":"A case for integrated processor-cache partitioning in chip multiprocessors","authors":"Shekhar Srikantaiah, R. Das, Asit K. Mishra, C. Das, M. Kandemir","doi":"10.1145/1654059.1654066","DOIUrl":"https://doi.org/10.1145/1654059.1654066","url":null,"abstract":"Existing cache partitioning schemes are designed in a manner oblivious to the implicit processor partitioning enforced by the operating system. This paper examines an operating system directed integrated processor-cache partitioning scheme that partitions both the available processors and the shared cache in a chip multiprocessor among different multi-threaded applications. Extensive simulations using a set of multiprogrammed workloads show that our integrated processor-cache partitioning scheme facilitates achieving better performance isolation as compared to state of the art hardware/software based solutions. Specifically, our integrated processor-cache partitioning approach performs, on an average, 20.83% and 14.14% better than equal partitioning and the implicit partitioning enforced by the underlying operating system, respectively, on the fair speedup metric on an 8 core system. We also compare our approach to processor partitioning alone and a state-of-the-art cache partitioning scheme and our scheme fares 8.21% and 9.19% better than these schemes on a 16 core system.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"244 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115960168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems SmartStore:下一代文件系统中具有语义感知的新元数据组织范式

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654070

Yu Hua, Hong Jiang, Yifeng Zhu, D. Feng, Lei Tian

{"title":"SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems","authors":"Yu Hua, Hong Jiang, Yifeng Zhu, D. Feng, Lei Tian","doi":"10.1145/1654059.1654070","DOIUrl":"https://doi.org/10.1145/1654059.1654070","url":null,"abstract":"Existing storage systems using hierarchical directory tree do not meet scalability and functionality requirements for exponentially growing datasets and increasingly complex queries in Exabyte-level systems with billions of files. This paper proposes semantic-aware organization, called SmartStore, which exploits metadata semantics of files to judiciously aggregate correlated files into semantica-ware groups by using information retrieval tools. Decentralized design improves system scalability and reduces query latency for complex queries (range and top-k queries), which is conducive to constructing semantic-aware caching, and conventional filename-based query. SmartStore limits search scope of complex query to a single or a minimal number of semantically related groups and avoids or alleviates brute-force search in entire system. Extensive experiments using real-world traces show that SmartStore improves system scalability and reduces query latency over basic database approaches by one thousand times. To the best of our knowledge, this is the first study implementing complex queries in large-scale file systems.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117025410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence 基于gpu的42 TFlops分层n体模拟及其在天体物理和湍流中的应用

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654123

T. Hamada, T. Narumi, Rio Yokota, K. Yasuoka, Keigo Nitadori, M. Taiji

{"title":"42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence","authors":"T. Hamada, T. Narumi, Rio Yokota, K. Yasuoka, Keigo Nitadori, M. Taiji","doi":"10.1145/1654059.1654123","DOIUrl":"https://doi.org/10.1145/1654059.1654123","url":null,"abstract":"As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical N-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous N-body simulations on GPUs that scale as O(N2), the present method calculates the O(N log N) treecode and O(N) fast multipole method (FMM) on the GPUs with unprecedented efficiency. We demonstrate the performance of our method by choosing one standard application -a gravitational N-body simulation- and one non-standard application -simulation of turbulence using vortex particles. The gravitational simulation using the treecode with 1,608,044,129 particles showed a sustained performance of 42.15 TFlops. The vortex particle simulation of homogeneous isotropic turbulence using the periodic FMM with 16,777,216 particles showed a sustained performance of 20.2 TFlops. The overall cost of the hardware was 228,912 dollars. The maximum corrected performance is 28.1TFlops for the gravitational simulation, which results in a cost performance of 124 MFlops/$. This correction is performed by counting the Flops based on the most efficient CPU algorithm. Any extra Flops that arise from the GPU implementation and parameter differences are not included in the 124 MFlops/$.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114376230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 132

Enabling high-fidelity neutron transport simulations on petascale architectures 在千万亿级架构上实现高保真中子输运模拟

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654128

D. Kaushik, Micheal Smith, A. Wollaber, Barry F. Smith, A. Siegel, W. Yang

{"title":"Enabling high-fidelity neutron transport simulations on petascale architectures","authors":"D. Kaushik, Micheal Smith, A. Wollaber, Barry F. Smith, A. Siegel, W. Yang","doi":"10.1145/1654059.1654128","DOIUrl":"https://doi.org/10.1145/1654059.1654128","url":null,"abstract":"The UNIC code is being developed as part of the DOE's Nuclear Energy Advanced Modeling and Simulation (NEAMS) program. UNIC is an unstructured, deterministic neutron transport code that allows a highly detailed description of a nuclear reactor. The primary goal of our simulation efforts is to reduce the uncertainties and biases in reactor design calculations by progressively replacing existing multilevel averaging (homogenization) techniques with more direct solution methods based on first principles. Since the neutron transport equation is seven dimensional (three in space, two in angle, one in energy, and one in time), these simulations are among the most memory and computationally intensive in all of computational science. In order to model the complex physics of a reactor core, billions of spatial elements, hundreds of angles, and thousands of energy groups are necessary, leading to problem sizes with petascale degrees of freedom. Therefore, these calculations exhaust memory resources on current and even next-generation architectures. In this paper, we present UNIC simulation results for two important representative problems in reactor design and analysis---PHENIX and ZPR-6. In each case, UNIC shows good weak scalability on up to 163,840 cores of Blue Gene/P (Argonne) and 122,800 cores of XT5 (Oak Ridge). While our current per processor performance is less than ideal, we demonstrate a clear ability to effectively utilize the leadership computing platforms. Over the coming months, we aim to improve the per processor performance while maintaining the high parallel efficiency by employing better algorithms such as spatial p- and h-multigrid preconditioners, optimized matrix-tensor operations, and weighted partitioning for better load balancing. Combining these additional algorithmic improvements with the availability of larger parallel machines should allow us to realize our long-term goal of explicit geometry coupled multiphysics reactor simulations. In the long run, these high-fidelity simulations will be able to replace expensive mockup experiments and reduce the uncertainty in crucial reactor design and operational parameters.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130483382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Machine learning-based prefetch optimization for data center applications 基于机器学习的数据中心应用预取优化

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654116

Shih-Wei Liao, Tzu-Han Hung, Donald Nguyen, Chinyen Chou, Chia-Heng Tu, Hucheng Zhou

引用次数: 79

Increasing memory miss tolerance for SIMD cores 增加SIMD内核的内存丢失容忍度

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654082

D. Tarjan, Jiayuan Meng, K. Skadron

引用次数: 60

HyperX: topology, routing, and packaging of efficient large-scale networks HyperX:高效大规模网络的拓扑、路由和封装

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654101

Jung Ho Ahn, N. Binkert, A. Davis, M. McLaren, R. Schreiber

引用次数: 259

Performance evaluation of NEC SX-9 using real science and engineering applications 使用真实科学和工程应用对NEC SX-9进行性能评估

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI: 10.1145/1654059.1654088

Takashi Soga, A. Musa, Y. Shimomura, Ryusuke Egawa, K. Itakura, H. Takizawa, Koki Okabe, Hiroaki Kobayashi

引用次数: 43