Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)最新文献_第3页

Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L 基于IBM BlueGene/L的大规模最大似然系统发育分析

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362628

M. Ott, J. Zola, A. Stamatakis, S. Aluru

{"title":"Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L","authors":"M. Ott, J. Zola, A. Stamatakis, S. Aluru","doi":"10.1145/1362622.1362628","DOIUrl":"https://doi.org/10.1145/1362622.1362628","url":null,"abstract":"Phylogenetic inference is a grand challenge in Bioinformatics due to immense computational requirements. The increasing popularity of multi-gene alignments in biological studies, which typically provide a stable topological signal due to a more favorable ratio of the number of base pairs to the number of sequences, coupled with rapid accumulation of sequence data in general, poses new challenges for high performance computing. In this paper, we demonstrate how state-of-the-art Maximum Likelihood (ML) programs can be efficiently scaled to the IBM BlueGene/L (BG/L) architecture, by porting RAxML, which is currently among the fastest and most accurate programs for phylogenetic inference under the ML criterion. We simultaneously exploit coarse-grained and fine-grained parallelism that is inherent in every ML-based biological analysis. Performance is assessed using datasets consisting of 212 sequences and 566,470 base pairs, and 2,182 sequences and 51,089 base pairs, respectively. To the best of our knowledge, these are the largest datasets analyzed under ML to date. The capability to analyze such datasets will help to address novel biological questions via phylogenetic analyses. Our experimental results indicate that the fine-grained parallelization scales well up to 1, 024 processors. Moreover, a larger number of processors can be efficiently exploited by a combination of coarse-grained and fine-grained parallelism. Finally, we demonstrate that our parallelization scales equally well on an AMD Opteron cluster with a less favorable network latency to processor speed ratio. We recorded super-linear speedups in several cases due to increased cache efficiency.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116839000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 159

A 281 Tflops calculation for X-ray protein structure analysis with special-purpose computers MDGRAPE-3 用专用计算机mdgraph -3进行蛋白质x射线结构分析的281 Tflops计算

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362698

Y. Ohno, E. Nishibori, T. Narumi, T. Koishi, T. Tahirov, H. Ago, M. Miyano, R. Himeno, T. Ebisuzaki, M. Sakata, M. Taiji

{"title":"A 281 Tflops calculation for X-ray protein structure analysis with special-purpose computers MDGRAPE-3","authors":"Y. Ohno, E. Nishibori, T. Narumi, T. Koishi, T. Tahirov, H. Ago, M. Miyano, R. Himeno, T. Ebisuzaki, M. Sakata, M. Taiji","doi":"10.1145/1362622.1362698","DOIUrl":"https://doi.org/10.1145/1362622.1362698","url":null,"abstract":"We have achieved a sustained calculation speed of 281 Tflops for the optimization of the 3-D structures of proteins from the X-ray experimental data by the Genetic Algorithm - Direct Space (GA-DS) method. In this calculation we used MDGRAPE-3, special-purpose computer for molecular simulations, with the peak performance of 752 Tflops. In the GA-DS method, a set of selected parameters which define the crystal structures of proteins is optimized by the Genetic Algorithm. As a criterion to estimate the model parameters, we used the reliability factor R1 which indicates the statistical difference between the calculated and the measured diffraction data. To evaluate this factor it is necessary to reconstruct the diffraction patterns of the model structures every time the model is updated. Therefore, in this method the nonequispaced Discrete Fourier Transformation (DFT) used to calculate the diffraction patterns dominates most of the computation time. To accelerate DFT calculations, we used the special-purpose computer, MDGRAPE-3. A molecule, Carbamoyl-Phosphate Synthetase was investigated. The final reliability factors were much smaller than the typical values obtained in other methods such as the Molecular Replacement (MR) method. Our results successfully demonstrate that high-performance computing with GA-DS method on special-purpose computers is effective for the structure determination of biological molecules and the method has a potential to be widely used in near future.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126863421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Age-based packet arbitration in large-radix k-ary n-cubes 大基数k元n-立方体中基于年龄的数据包仲裁

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362630

D. Abts, D. Weisser

引用次数: 54

Performance and cost optimization for multiple large-scale grid workflow applications 多个大规模网格工作流应用程序的性能和成本优化

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362639

Rubing Duan, R. Prodan, T. Fahringer

引用次数: 61

GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing GRAPE-DR: 2 pflops的大规模并行计算机，512核512 gflops处理器芯片，用于科学计算

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362647

J. Makino, K. Hiraki, M. Inaba

引用次数: 50

Anatomy of a cortical simulator 皮质模拟器的解剖

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362627

R. Ananthanarayanan, D. Modha

{"title":"Anatomy of a cortical simulator","authors":"R. Ananthanarayanan, D. Modha","doi":"10.1145/1362622.1362627","DOIUrl":"https://doi.org/10.1145/1362622.1362627","url":null,"abstract":"Insights into brain's high-level computational principles will lead to novel cognitive systems, computing architectures, programming paradigms, and numerous practical applications. An important step towards this end is the study of large networks of cortical spiking neurons. We have built a cortical simulator, C2, incorporating several algorithmic enhancements to optimize the simulation scale and time, through: computationally efficient simulation of neurons in a clock-driven and synapses in an event-driven fashion; memory efficient representation of simulation state; and communication efficient message exchanges. Using phenomenological, single-compartment models of spiking neurons and synapses with spike-timing dependent plasticity, we represented a rat-scale cortical model (55 million neurons, 442 billion synapses) in STB memory of a 32, 768-processor BlueGene/L. With 1 millisecond resolution for neuronal dynamics and 1--20 milliseconds axonal delays, C2 can simulate 1 second of model time in 9 seconds per Hertz of average neuronal firing rate. In summary, by combining state-of-the-art hardware with innovative algorithms and software design, we simultaneously achieved unprecedented time-to-solution on an unprecedented problem size.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124875464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 113

Inter-operating grids through delegated matchmaking 通过委托配对互操作网格

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362640

A. Iosup, T. Tannenbaum, M. Farrellee, D. Epema, M. Livny

引用次数: 104

Anomaly detection and diagnosis in grid environments 网格环境下的异常检测与诊断

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362667

Lingyun Yang, Chuang Liu, J. Schopf, Ian T Foster

引用次数: 28

RobuSTore: a distributed storage architecture with robust and high performance RobuSTore:一种具有鲁棒性和高性能的分布式存储架构

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362682

Huaxia Xia

引用次数: 44

Optimizing center performance through coordinated data staging, scheduling and recovery 通过协调数据分期、调度和恢复，优化中心性能

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362696

Zhe Zhang, Chao Wang, Sudharshan S. Vazhkudai, Xiaosong Ma, Gregory G. Pike, John Cobb, F. Mueller

引用次数: 28