Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)最新文献

筛选
英文 中文
Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L 基于IBM BlueGene/L的大规模最大似然系统发育分析
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362628
M. Ott, J. Zola, A. Stamatakis, S. Aluru
{"title":"Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L","authors":"M. Ott, J. Zola, A. Stamatakis, S. Aluru","doi":"10.1145/1362622.1362628","DOIUrl":"https://doi.org/10.1145/1362622.1362628","url":null,"abstract":"Phylogenetic inference is a grand challenge in Bioinformatics due to immense computational requirements. The increasing popularity of multi-gene alignments in biological studies, which typically provide a stable topological signal due to a more favorable ratio of the number of base pairs to the number of sequences, coupled with rapid accumulation of sequence data in general, poses new challenges for high performance computing. In this paper, we demonstrate how state-of-the-art Maximum Likelihood (ML) programs can be efficiently scaled to the IBM BlueGene/L (BG/L) architecture, by porting RAxML, which is currently among the fastest and most accurate programs for phylogenetic inference under the ML criterion. We simultaneously exploit coarse-grained and fine-grained parallelism that is inherent in every ML-based biological analysis. Performance is assessed using datasets consisting of 212 sequences and 566,470 base pairs, and 2,182 sequences and 51,089 base pairs, respectively. To the best of our knowledge, these are the largest datasets analyzed under ML to date. The capability to analyze such datasets will help to address novel biological questions via phylogenetic analyses. Our experimental results indicate that the fine-grained parallelization scales well up to 1, 024 processors. Moreover, a larger number of processors can be efficiently exploited by a combination of coarse-grained and fine-grained parallelism. Finally, we demonstrate that our parallelization scales equally well on an AMD Opteron cluster with a less favorable network latency to processor speed ratio. We recorded super-linear speedups in several cases due to increased cache efficiency.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116839000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 159
A 281 Tflops calculation for X-ray protein structure analysis with special-purpose computers MDGRAPE-3 用专用计算机mdgraph -3进行蛋白质x射线结构分析的281 Tflops计算
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362698
Y. Ohno, E. Nishibori, T. Narumi, T. Koishi, T. Tahirov, H. Ago, M. Miyano, R. Himeno, T. Ebisuzaki, M. Sakata, M. Taiji
{"title":"A 281 Tflops calculation for X-ray protein structure analysis with special-purpose computers MDGRAPE-3","authors":"Y. Ohno, E. Nishibori, T. Narumi, T. Koishi, T. Tahirov, H. Ago, M. Miyano, R. Himeno, T. Ebisuzaki, M. Sakata, M. Taiji","doi":"10.1145/1362622.1362698","DOIUrl":"https://doi.org/10.1145/1362622.1362698","url":null,"abstract":"We have achieved a sustained calculation speed of 281 Tflops for the optimization of the 3-D structures of proteins from the X-ray experimental data by the Genetic Algorithm - Direct Space (GA-DS) method. In this calculation we used MDGRAPE-3, special-purpose computer for molecular simulations, with the peak performance of 752 Tflops. In the GA-DS method, a set of selected parameters which define the crystal structures of proteins is optimized by the Genetic Algorithm. As a criterion to estimate the model parameters, we used the reliability factor R1 which indicates the statistical difference between the calculated and the measured diffraction data. To evaluate this factor it is necessary to reconstruct the diffraction patterns of the model structures every time the model is updated. Therefore, in this method the nonequispaced Discrete Fourier Transformation (DFT) used to calculate the diffraction patterns dominates most of the computation time. To accelerate DFT calculations, we used the special-purpose computer, MDGRAPE-3. A molecule, Carbamoyl-Phosphate Synthetase was investigated. The final reliability factors were much smaller than the typical values obtained in other methods such as the Molecular Replacement (MR) method. Our results successfully demonstrate that high-performance computing with GA-DS method on special-purpose computers is effective for the structure determination of biological molecules and the method has a potential to be widely used in near future.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126863421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Age-based packet arbitration in large-radix k-ary n-cubes 大基数k元n-立方体中基于年龄的数据包仲裁
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362630
D. Abts, D. Weisser
{"title":"Age-based packet arbitration in large-radix k-ary n-cubes","authors":"D. Abts, D. Weisser","doi":"10.1145/1362622.1362630","DOIUrl":"https://doi.org/10.1145/1362622.1362630","url":null,"abstract":"As applications scale to increasingly large processor counts, the interconnection network is frequently the limiting factor in application performance. In order to achieve application scalability, the interconnect must maintain high bandwidth while minimizing variation in packet latency. As the offered load in the network increases with growing problem sizes and processor counts, so does the expected maximum packet latency in the network, directly impacting performance of applications with any synchronized communication. Age-based packet arbitration reduces the variance in packet latency as well as average latency. This paper describes the Cray XT router packet aging algorithm which allows globally fair arbitration by incorporating \"age\" in the packet output arbitration. We describe the parameters of the aging algorithm and how to arrive at appropriate settings. We show that an efficient aging algorithm reduces both the average packet latency and the variance in packet latency on communication-intensive benchmarks.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"22 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125685487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Performance and cost optimization for multiple large-scale grid workflow applications 多个大规模网格工作流应用程序的性能和成本优化
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362639
Rubing Duan, R. Prodan, T. Fahringer
{"title":"Performance and cost optimization for multiple large-scale grid workflow applications","authors":"Rubing Duan, R. Prodan, T. Fahringer","doi":"10.1145/1362622.1362639","DOIUrl":"https://doi.org/10.1145/1362622.1362639","url":null,"abstract":"Scheduling large-scale applications on the Grid is a fundamental challenge and is critical to application performance and cost. Large-scale applications typically contain a large number of homogeneous and concurrent activities which are main bottlenecks, but open great potentials for optimization. This paper presents a new formulation of the well-known NP-complete problems and two novel algorithms that addresses the problems. The optimization problems are formulated as sequential cooperative games among workflow managers. Experimental results indicate that we have successfully devised and implemented one group of effective, efficient, and feasible approaches. They can produce soultuins of significantly better performance and cost than traditional algorithms. Our algorithms have considerably low time complexity and can assign 1,000,000 activities to 10,000 processors within 0.4 second on one Opteron processor. Moreover, the solutions can be practically performed by workflow managers, and the violation of QoS can be easily detected, which are critical to fault tolerance.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130357776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing GRAPE-DR: 2 pflops的大规模并行计算机,512核512 gflops处理器芯片,用于科学计算
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362647
J. Makino, K. Hiraki, M. Inaba
{"title":"GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing","authors":"J. Makino, K. Hiraki, M. Inaba","doi":"10.1145/1362622.1362647","DOIUrl":"https://doi.org/10.1145/1362622.1362647","url":null,"abstract":"We describe the GRAPE-DR (Greatly Reduced Array of Processor Elements with Data Reduction) system, which will consist of 4096 processor chips each with 512 cores operating at the clock frequency of 500 MHz. The peak speed of a processor chip is 512Gflops (single precision) or 256 Gflops (double precision). The GRAPE-DR chip works as an attached processor to standard PCs. Currently, a PCI-X board with single GRAPE-DR chip is in operation. We are developing a 4-chip board with PCI-Express interface, which will have the peak performance of 1 Tflops. The final system will be a cluster of 512 PCs each with two GRAPE-DR boards. We plan to complete the final system by early 2009. The application area of GRAPE-DR covers particle-based simulations such as astrophysical many-body simulations and molecular-dynamics simulations, quantum chemistry calculations, various applications which requires dense matrix operations, and many other compute-intensive applications.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134446822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Anatomy of a cortical simulator 皮质模拟器的解剖
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362627
R. Ananthanarayanan, D. Modha
{"title":"Anatomy of a cortical simulator","authors":"R. Ananthanarayanan, D. Modha","doi":"10.1145/1362622.1362627","DOIUrl":"https://doi.org/10.1145/1362622.1362627","url":null,"abstract":"Insights into brain's high-level computational principles will lead to novel cognitive systems, computing architectures, programming paradigms, and numerous practical applications. An important step towards this end is the study of large networks of cortical spiking neurons. We have built a cortical simulator, C2, incorporating several algorithmic enhancements to optimize the simulation scale and time, through: computationally efficient simulation of neurons in a clock-driven and synapses in an event-driven fashion; memory efficient representation of simulation state; and communication efficient message exchanges. Using phenomenological, single-compartment models of spiking neurons and synapses with spike-timing dependent plasticity, we represented a rat-scale cortical model (55 million neurons, 442 billion synapses) in STB memory of a 32, 768-processor BlueGene/L. With 1 millisecond resolution for neuronal dynamics and 1--20 milliseconds axonal delays, C2 can simulate 1 second of model time in 9 seconds per Hertz of average neuronal firing rate. In summary, by combining state-of-the-art hardware with innovative algorithms and software design, we simultaneously achieved unprecedented time-to-solution on an unprecedented problem size.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124875464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 113
Inter-operating grids through delegated matchmaking 通过委托配对互操作网格
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362640
A. Iosup, T. Tannenbaum, M. Farrellee, D. Epema, M. Livny
{"title":"Inter-operating grids through delegated matchmaking","authors":"A. Iosup, T. Tannenbaum, M. Farrellee, D. Epema, M. Livny","doi":"10.1145/1362622.1362640","DOIUrl":"https://doi.org/10.1145/1362622.1362640","url":null,"abstract":"The grid vision of a single computing utility has yet to materíalize: while many grids with thousands of processors each exist, most work in isolation. An important obstacle for the effective and efficient inter-operation of grids is the problem of resource selection. In this paper we propose a solution to this problem that combines the hierarchical and decentralized approaches for interconnecting grids. In our solution, a hierarchy of grid sites is augmented with peer-to-peer connections between sites under the same administrative control. To operate this architecture, we employ the key concept of delegated matchmaking, which temporarily binds resources from remote sites to the local environment. With trace-based simulations we evaluate our solution under various infrastructural and load conditions, and we show that it outperforms other approaches to inter-operating grids. Specifically, we show that delegated matchmaking achieves up to 60% more goodput and completes 26% more jobs than its best alternative.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129974774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
Anomaly detection and diagnosis in grid environments 网格环境下的异常检测与诊断
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362667
Lingyun Yang, Chuang Liu, J. Schopf, Ian T Foster
{"title":"Anomaly detection and diagnosis in grid environments","authors":"Lingyun Yang, Chuang Liu, J. Schopf, Ian T Foster","doi":"10.1145/1362622.1362667","DOIUrl":"https://doi.org/10.1145/1362622.1362667","url":null,"abstract":"Identifying and diagnosing anomalies in application behavior is critical to delivering reliable application-level performance. In this paper we introduce a strategy to detect anomalies and diagnose the possible reasons behind them. Our approach extends the traditional window-based strategy by using signal-processing techniques to filter out recurring, background fluctuations in resource behavior. In addition, we have developed a diagnosis technique that uses standard monitoring data to determine which related changes in behavior may cause anomalies. We evaluate our anomaly detection and diagnosis technique by applying it in three contexts when we insert anomalies into the system at random intervals. The experimental results show that our strategy detects up to 96% of anomalies while reducing the false positive rate by up to 90% compared to the traditional window average strategy. In addition, our strategy can diagnose the reason for the anomaly approximately 75% of the time.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121148727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
RobuSTore: a distributed storage architecture with robust and high performance RobuSTore:一种具有鲁棒性和高性能的分布式存储架构
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362682
Huaxia Xia
{"title":"RobuSTore: a distributed storage architecture with robust and high performance","authors":"Huaxia Xia","doi":"10.1145/1362622.1362682","DOIUrl":"https://doi.org/10.1145/1362622.1362682","url":null,"abstract":"Emerging large-scale scientific applications require to access large data objects in high and robust performance. We propose RobuSTore, a storage architecture that combines erasure codes and speculative access mechanisms for parallel write and read in distributed environments. The mechanisms can effectively aggregate the bandwidth from a large number of distributed disks and statistically tolerate pear-disk performance variation. Our simulation results affirm the high and robust performance of RobuSTore in both write and read operations compared to traditional parallel storage systems. For example, for a 1GB data access using 64 disks, RobuSTore achieves average bandwidth of 186MBps for write and 400MBps for read, nearly 6x and 15x that achieved by a RAID-0 system. The standard deviation of access latency is only 0.5 second, about 9% of the write latency and 20% of the read latency, and a 5-fold improvement from RAID-0. The improvements are achieved at moderate cost: about 40% increase in I/O operations and 2x-3x increase in storage capacity utilization.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115518066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Optimizing center performance through coordinated data staging, scheduling and recovery 通过协调数据分期、调度和恢复,优化中心性能
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI: 10.1145/1362622.1362696
Zhe Zhang, Chao Wang, Sudharshan S. Vazhkudai, Xiaosong Ma, Gregory G. Pike, John Cobb, F. Mueller
{"title":"Optimizing center performance through coordinated data staging, scheduling and recovery","authors":"Zhe Zhang, Chao Wang, Sudharshan S. Vazhkudai, Xiaosong Ma, Gregory G. Pike, John Cobb, F. Mueller","doi":"10.1145/1362622.1362696","DOIUrl":"https://doi.org/10.1145/1362622.1362696","url":null,"abstract":"Procurement and the optimized utilization of Petascale supercomputers and centers is a renewed national priority. Sustained performance and availability of such large centers is a key technical challenge significantly impacting their usability. Storage systems are known to be the primary fault source leading to data unavailability and job resubmissions. This results in reduced center performance, partially due to the lack of coordination between I/O activities and job scheduling. In this work, we propose the coordination of job scheduling with data staging/offloading and on-demand staged data reconstruction to address the availability of job input data and to improve center-wide performance. Fundamental to both mechanisms is the efficient management of transient data: in the way it is scheduled and recovered. Collectively, from a center's standpoint, these techniques optimize resource usage and increase its data/service availability. From a user's standpoint, they reduce the job turnaround time and optimize the allocated time usage.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132092061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信