{"title":"Revisiting Web Server Workload Invariants in the Context of Scientific Web Sites","authors":"Anne‐Marie Faber, Minaxi Gupta, C. Viecco","doi":"10.1145/1188455.1188570","DOIUrl":"https://doi.org/10.1145/1188455.1188570","url":null,"abstract":"The Web has evolved much from when Arlitt and Williamson proposed the ten Web workload invariants more than a decade ago. Many diverse communities now depend on the Web in their day-to-day lives. A current knowledge of the invariants for the Web is useful for performance enhancement and for synthetic Web workload generation. Invariants can also serve as a useful tool for detecting anomaly and misuse, a new dimension of Web usage arising from the change in trust assumptions in the Internet in the recent years. Focusing on scientific Web servers, we revisit the Web server workload invariants and find that only three out of the ten invariants hold as-is. We investigate appropriate revisions to the invariants that do not hold and also propose three new invariants for scientific Web servers","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115118405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Routing in High-Radix Clos Network","authors":"John Kim, W. Dally, D. Abts","doi":"10.1145/1188455.1188552","DOIUrl":"https://doi.org/10.1145/1188455.1188552","url":null,"abstract":"Recent increase in the pin bandwidth of integrated-circuits has motivated an increase in the degree or radix of interconnection network routers. The folded-Clos network can take advantage of these high-radix routers and this paper investigates adaptive routing in such networks. We show that adaptive routing, if done properly, outperforms oblivious routing by providing lower latency, lower latency variance, and higher throughput with limited buffering. Adaptive routing is particularly useful in load balancing around nonuniformities caused by deterministically routed traffic or the presence of faults in the network. We evaluate alternative allocation algorithms used in adaptive routing and compare their performance. The use of randomization in the allocation algorithms can simplify the implementation while sacrificing minimal performance. The cost of adaptive routing, in terms of router latency and area, is increased in high-radix routers. We show that the use of imprecise queue information reduces the implementation complexity and precomputation of the allocations minimizes the impact of adaptive routing on router latency","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124706288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting Dynamic Migration in Tightly Coupled Grid Applications","authors":"Liang Chen, Qian Zhu, G. Agrawal","doi":"10.1145/1188455.1188577","DOIUrl":"https://doi.org/10.1145/1188455.1188577","url":null,"abstract":"In recent years, there has been a growing trend towards supporting more tightly coupled applications on the grid, including scientific workflows, applications that use pipelined or data-flow like processing, and distributed streaming applications. As availability of resources can vary over time in a grid environment, dynamic reallocation of resources is very important for these applications, particularly because of their long-running nature, and because they often require large-volume data transfers between processing stages. This paper considers the problem of supporting and efficiently implementing dynamic resource allocation for tightly-coupled and pipelined applications in a grid environment. We provide an alternative to basic checkpointing, using the notion of light-weight summary structure (LSS), to enable efficient migration. The idea behind LSS is that at certain points during the execution of a processing stage, the state of the program can be summarized by a small amount of memory. This allows us to perform low-cost process migration, as long as such memory can be identified by an application developer, and migration is performed only at these points. Our implementation and evaluation of LSS based process migration has been in the context of the GATES (grid-based adaptive execution on streams) middleware that we have been developing. We also present an algorithm for dynamic resource allocation, and have shown an architecture for resource monitoring and allocation. We have extensively evaluated our implementation using three stream data processing applications, and show that the use of LSS allows efficient process migration","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128793215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Bowers, Edmond Chow, Huafeng Xu, R. Dror, M. Eastwood, Brent A. Gregersen, J. L. Klepeis, I. Kolossváry, Mark A. Moraes, Federico D. Sacerdoti, J. Salmon, Yibing Shan, D. Shaw
{"title":"Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters","authors":"K. Bowers, Edmond Chow, Huafeng Xu, R. Dror, M. Eastwood, Brent A. Gregersen, J. L. Klepeis, I. Kolossváry, Mark A. Moraes, Federico D. Sacerdoti, J. Salmon, Yibing Shan, D. Shaw","doi":"10.1145/1188455.1188544","DOIUrl":"https://doi.org/10.1145/1188455.1188544","url":null,"abstract":"Although molecular dynamics (MD) simulations of biomolecular systems often run for days to months, many events of great scientific interest and pharmaceutical relevance occur on long time scales that remain beyond reach. We present several new algorithms and implementation techniques that significantly accelerate parallel MD simulations compared with current state-of-the-art codes. These include a novel parallel decomposition method and message-passing techniques that reduce communication requirements, as well as novel communication primitives that further reduce communication time. We have also developed numerical techniques that maintain high accuracy while using single precision computation in order to exploit processor-level vector instructions. These methods are embodied in a newly developed MD code called Desmond that achieves unprecedented simulation throughput and parallel scalability on commodity clusters. Our results suggest that Desmond's parallel performance substantially surpasses that of any previously described code. For example, on a standard benchmark, Desmond's performance on a conventional Opteron cluster with 2K processors slightly exceeded the reported performance of IBM's Blue Gene/L machine with 32K processors running its Blue Matter MD code","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126216560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Fatahalian, D. Horn, T. Knight, L. Leem, M. Houston, Ji Young Park, M. Erez, Manman Ren, A. Aiken, W. Dally, P. Hanrahan
{"title":"Sequoia: Programming the Memory Hierarchy","authors":"K. Fatahalian, D. Horn, T. Knight, L. Leem, M. Houston, Ji Young Park, M. Erez, Manman Ren, A. Aiken, W. Dally, P. Hanrahan","doi":"10.1145/1188455.1188543","DOIUrl":"https://doi.org/10.1145/1188455.1188543","url":null,"abstract":"We present Sequoia, a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines featuring different memory hierarchy configurations. Sequoia abstractly exposes hierarchical memory in the programming model and provides language mechanisms to describe communication vertically through the machine and to localize computation to particular memory locations within it. We have implemented a complete programming system, including a compiler and runtime systems for cell processor-based blade systems and distributed memory clusters, and demonstrate efficient performance running Sequoia programs on both of these platforms","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"358 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115857503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Takemiya, Yoshio Tanaka, S. Sekiguchi, S. Ogata, R. Kalia, A. Nakano, P. Vashishta
{"title":"Sustainable Adaptive Grid Supercomputing: Multiscale Simulation of Semiconductor Processing across the Pacific","authors":"H. Takemiya, Yoshio Tanaka, S. Sekiguchi, S. Ogata, R. Kalia, A. Nakano, P. Vashishta","doi":"10.1145/1188455.1188566","DOIUrl":"https://doi.org/10.1145/1188455.1188566","url":null,"abstract":"We propose a reservation-based sustainable adaptive grid supercomputing paradigm to enable tightly coupled computations of considerable scale (involving over 1,000 processors) and duration (over tens of continuous days) on a grid of geographically distributed parallel supercomputers. The paradigm is demonstrated for an adaptive multiscale simulation application, in which accurate but compute-intensive quantum mechanical (QM) simulations are embedded within a classical molecular dynamics (MD) simulation only when and where high fidelity is required. Key technical innovations include: 1) an embedded divide-and-conquer algorithmic framework to maximally expose data and computation localities for enhanced scalability; 2) a buffered-cluster hybridization scheme to adaptively adjust MD/QM boundaries to maintain the model accuracy; and 3) a hybrid grid remote procedure call (GridRPC) + message passing interface (MPI) grid application framework to combine flexibility (adaptive resource allocation and migration), fault tolerance (automated fault recovery), and efficiency (scalable management of large computing resources). We have achieved an automated execution of multiscale MD/QM simulation on a Grid consisting of 6 supercomputer centers in Japan and the US (in total of 150 thousand processor hours) for the dynamic simulation of implanted oxygen atoms in a silicon substrate, in which the number of processors changes dynamically on demand and resources are allocated and migrated dynamically according to both reservations and unexpected faults. The simulation results reveal a strong dependence of the oxygen penetration depth on the incident oxygen-beam position, which is useful information to further advance SIMOX (separation by implanted oxygen) technique to fabricate high speed and low power-consumption semiconductor devices","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117309761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Gerndt, Samuel Sarholz, M. Wolter, Dieter an Mey, C. Bischof, T. Kuhlen
{"title":"Nested OpenMP for Efficient Computation of 3D Critical Points in Multi-Block CFD Datasets","authors":"A. Gerndt, Samuel Sarholz, M. Wolter, Dieter an Mey, C. Bischof, T. Kuhlen","doi":"10.1145/1188455.1188553","DOIUrl":"https://doi.org/10.1145/1188455.1188553","url":null,"abstract":"Extraction of complex data structures like vector field topologies in large-scale, unsteady flow field datasets for the interactive exploration in virtual environments cannot be carried out without parallelization strategies. We present an approach based on Nested OpenMP to find critical points, which are the essential parts of velocity field topologies. We evaluate our parallelization scheme on several multi-block datasets, and present the results for various thread counts and loop schedules on all parallelization levels. Our experience suggests that upcoming massively multi-threaded processor architectures can be very advantageously for large-scale feature extractions","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131237595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Krishnamoorthy, Ümit V. Çatalyürek, J. Nieplocha, A. Rountev, P. Sadayappan
{"title":"Hypergraph Partitioning for Automatic Memory Hierarchy Management","authors":"S. Krishnamoorthy, Ümit V. Çatalyürek, J. Nieplocha, A. Rountev, P. Sadayappan","doi":"10.1145/1188455.1188558","DOIUrl":"https://doi.org/10.1145/1188455.1188558","url":null,"abstract":"In this paper, we present a mechanism for automatic management of the memory hierarchy, including secondary storage, in the context of a global address space parallel programming framework. The programmer specifies the parallelism and locality in the computation. The scheduling of the computation into stages, together with the movement of the associated data between secondary storage and global memory, and between global memory and local memory, is automatically managed. A novel formulation of hypergraph partitioning is used to model the optimization problem of minimizing disk I/O. Experimental evaluation of the proposed approach using a sub-computation from the quantum chemistry domain shows a reduction in the disk I/O cost by up to a factor of 11, and a reduction in turnaround time by up to 49%, as compared to alternative approaches used in state-of-the-art quantum chemistry codes","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132166099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs","authors":"M. Lim, V. Freeh, D. Lowenthal","doi":"10.1145/1188455.1188567","DOIUrl":"https://doi.org/10.1145/1188455.1188567","url":null,"abstract":"Although users of high-performance computing are most interested in raw performance, both energy and power consumption have become critical concerns. Some microprocessors allow frequency and voltage scaling, which enables a system to reduce CPU performance and power when the CPU is not on the critical path. When properly directed, such dynamic frequency and voltage scaling can produce significant energy savings with little performance penalty. This paper presents an MPI runtime system that dynamically reduces CPU performance during communication phases in MPI programs. It dynamically identifies such phases and, without profiling or training, selects the CPU frequency in order to minimize energy-delay product. All analysis and subsequent frequency and voltage scaling is within MPI and so is entirely transparent to the application. This means that the large number of existing MPI programs, as well as new ones being developed, can use our system without modification. Results show that the average reduction in energy-delay product over the NAS benchmark suite is 10% - the average energy reduction is 12% while the average execution time increase is only 2.1%","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115404473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CycleMeter: Detecting Fraudulent Peers in Internet Cycle Sharing","authors":"Zheng Zhang, Y. C. Hu, S. Midkiff","doi":"10.1145/1188455.1188584","DOIUrl":"https://doi.org/10.1145/1188455.1188584","url":null,"abstract":"Internet cycle sharing systems that utilize idle computing resources dramatically increase the available resources for high performance computing. Fraudulent resource providers, however, can subvert these systems. While previous research has investigated protection against resource providers that return bad results, we consider a different fraudulent behavior - cycle short-changing - in which the resource provider faithfully executes the submitted job, but using a smaller percentage of the CPU resources than he/she promises. To detect this short-changing, we propose CycleMeter, a tool that allows a remotely executing application to accurately monitor the percentage of CPU resources it is utilizing throughout its execution period. CycleMeter employs a microbenchmark to measure the instantaneous CPU utilization of the application, and employs a simple and practical mechanism for embedding the microbenchmark into the application. Our experimental results on three operating systems and uniprocessor and multiprocessor machines show that CycleMeter is portable, incurs a low overhead, and is highly effective in detecting a spectrum of cycle shortchanging behavior","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130000459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}