{"title":"The Effects of Systemic Packet Loss on Aggregate TCP Flows","authors":"T. Hacker, Brian D. Noble, B. Athey","doi":"10.1109/SC.2002.10029","DOIUrl":"https://doi.org/10.1109/SC.2002.10029","url":null,"abstract":"The use of parallel TCP connections to increase throughput for bulk transfers is common practice within the high performance computing community. However, the effectiveness, fairness, and efficiency of data transfers across parallel connections is unclear. This paper considers the impact of systemic non-congestion related packet loss on the effectiveness, fairness, and efficiency of parallel TCP transmissions. The results indicate that parallel connections are effective at increasing aggregate throughput, and increase the overall efficiency of the network bottleneck. In the presence of congestion related losses, parallel flows steal bandwidth from other single stream flows. A simple modification is presented that reduces the fairness problems when congestion is present, but retains effectiveness and efficiency.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129972102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monitoring Data Archives for Grid Environments","authors":"Jason R. Lee, D. Gunter, M. Stoufer, B. Tierney","doi":"10.1109/SC.2002.10047","DOIUrl":"https://doi.org/10.1109/SC.2002.10047","url":null,"abstract":"Developers and users of high-performance distributed systems often observe performance problems such as unexpectedly low throughput or high latency. To determine the source of these performance problems, detailed end-to-end monitoring data from applications, networks, operating systems, and hardware must be correlated across time and space. Researchers need to be able to view and compare this very detailed monitoring data from a variety of angles. To address this problem, we propose a relational monitoring data archive that is designed to efficiently handle high-volume streams of monitoring data. In this paper we present an instrumentation and monitoring event archive service that can be used to collect and aggregate detailed end-to-end monitoring information from distributed applications. This archive service is designed to be scalable and fault tolerant. We also show how the archive is based on the \"Grid Monitoring Architecture\" defined by the Global Grid Forum.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123709094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-Level Parallelism for Deterministic and Stochastic CFD Problems","authors":"S. Dong, G. Karniadakis","doi":"10.1109/SC.2002.10005","DOIUrl":"https://doi.org/10.1109/SC.2002.10005","url":null,"abstract":"A hybrid two-level parallelism using MPI/OpenMP is implemented in the general-purpose spectral/hp element CFD code NekTar to take advantage of the hierarchical structures arising in deterministic and stochastic CFD problems. We take a coarse grain approach to shared-memory parallelism with OpenMP and employ a workload-splitting scheme that can reduce the OpenMP synchronizations to the minimum. The hybrid implementation shows good scalability with respect to both the problem size and the number of processors in case of a fixed problem size. With the same number of processors, the hybrid model with 2 (or 4) OpenMP threads per MPI process is observed to perform better than pure MPI and pure OpenMP on the NCSA SGI Origin 2000, while the pure MPI model performs the best on the IBM SP3 at SDSC and on the Compaq Alpha cluster at PSC. A key new result is that the use of threads facilitates effectively p-refinement, which is crucial to adaptive discretization using high-order methods.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124854937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SMP System Interconnect Instrumentation for Performance Analysis","authors":"L. Noordergraaf, R. Zak","doi":"10.5555/762761.762779","DOIUrl":"https://doi.org/10.5555/762761.762779","url":null,"abstract":"The system interconnect is often the performance bottleneck in SMP computers. Although modern SMPs include event counters on processors and interconnects, these provide limited information about the interaction of processors vying for shared resources. Additionally, transaction sources and addresses are not readily available, making analysis of access patterns and data locality difficult. Enhanced system interconnect instrumentation is required to extract this information. This paper describes instrumentation implemented for monitoring the system interconnect on Sun Fire™ servers. The instrumentation supports sophisticated programmable filtering of event counters, allowing us to construct histograms of system interconnect activity, and a FIFO to capture trace sequences. Our implementation results in a very small hardware footprint, making it appropriate for inclusion in commodity hardware. We also describe a sampling of software tools and results based on this infrastructure. Applications have included performance profiling, architectural studies, and hardware bringup and debugging.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121238166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing","authors":"T. Sterling, H. Zima","doi":"10.1109/SC.2002.10061","DOIUrl":"https://doi.org/10.1109/SC.2002.10061","url":null,"abstract":"Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and lower power consumption than current MPPs or commodity clusters. In this paper we describe the design of Gilgamesh a PIM-based massively parallel architecture, and elements of its execution model. Gilgamesh extends existing PIM capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance. The Gilgamesh execution model is based on macroservers a middleware layer which supports object-based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing. The paper concludes with a discussion of related research activities and an outlook to future work.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126890411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Proteus Multiprotocol Message Library","authors":"K. Chiu, M. Govindaraju, Dennis Gannon","doi":"10.1109/SC.2002.10050","DOIUrl":"https://doi.org/10.1109/SC.2002.10050","url":null,"abstract":"Grid systems span manifold organizations and application domains. Because this diverse environment inevitably engenders multiple protocols, interoperability mechanisms are crucial to seamless, pervasive access. This paper presents the design, rationale, and implementation of the Proteus multiprotocol library for integrating multiple message protocols, such as SOAP and JMS, within one system. Proteus decouples application code from protocol code at run-time, allowing clients to incorporate separately developed protocols without recompiling or halting. Through generic serialization, which separates the transfer syntax from the message type, protocols can also be added independently of serialization routines. We also show performance-enhancing mechanisms for Grid services that examine metadata, but pass actual data through opaquely (such as adapters). The interface provided to protocol implementors is general enough to support protocols as disparate as our current implementations: SOAP, JMS, and binary. Proteus is written in C++; a Java port is planned.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128369508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Mazzucco, A. Ananthanarayan, R. Grossman, Jorge Levera, G. Rao
{"title":"Merging Multiple Data Streams on Common Keys over High Performance Networks","authors":"Marco Mazzucco, A. Ananthanarayan, R. Grossman, Jorge Levera, G. Rao","doi":"10.1109/SC.2002.10044","DOIUrl":"https://doi.org/10.1109/SC.2002.10044","url":null,"abstract":"The model for data mining on streaming data assumes that there is a buffer of fixed length and a data stream of infinite length and the challenge is to extract patterns, changes, anomalies, and statistically significant structures by examining the data one time and storing records and derived attributes of length less than N. As data grids, data webs, and semantic webs become more common, mining distributed streaming data will become more and more important. The first step when presented with two or more distributed streams is to merge them using a common key. In this paper, we present two algorithms for merging streaming data using a common key. We also present experimental studies showing these algorithms scale in practice to OC-12 networks.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130579623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SmartPointers: Personalized Scientific Data Portals In Your Hand","authors":"M. Wolf, Zhongtang Cai, Weiyun Huang, K. Schwan","doi":"10.1109/SC.2002.10003","DOIUrl":"https://doi.org/10.1109/SC.2002.10003","url":null,"abstract":"The SmartPointer system provides a paradigm for utilizing multiple light-weight client endpoints in a real-time scientific visualization infrastructure. Together, the client and server infrastructure form a new type of data portal for scientific computing. The clients can be used to personalize data for the needs of the individual scientist. This personalization of a shared dataset is designed to allow multiple scientists, each with their laptops or iPaqs to explore the dataset from different angles and with different personalized filters. As an example, iPaq clients can display 2D derived data functions which can be used to dynamically update and annotate the shared data space, which might be visualized separately on a large immersive display such as a CAVE. Measurements are presented for such a system, built upon the ECho middleware system developed at Georgia Tech.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122744122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Utilization of Departmental Computing GRID System for Development of an Artificial Intelligent Tapping Inspection Method, Tapping Sound Analysis","authors":"S. Kim, J. Hwang, C. Lee, Sangsan Lee","doi":"10.1109/SC.2002.10018","DOIUrl":"https://doi.org/10.1109/SC.2002.10018","url":null,"abstract":"Tapping Sound Analysis is a new NDE method, which determines the existence of subsurface defects by comparing the tapping sound of test structure and original healthy structure. The tapping sound of original healthy structure is named sound print of the structure and is obtained through high precision computation. Because many tapping points are required to obtain the exact sound print data, many times of tapping sound simulation are required. The simulation of tapping sound requires complicated numerical procedures. Departmental Computing GRID system was utilized to run numerical simulations. Three cluster systems and one PC-farm system comprise DCG system. Tapping sound simulations were launched and monitored through Globus and CONDOR. A total of 160 Tera floating-point (double-precision) operations was performed and the elapsed time was 41,880 sec. From the numerical experiments, Grid computing technology reduced the necessary time to make sound print database and made TSA a feasible and practical methodology.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114717794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Analysis Techniques for Microprocessor Performance Counter Metrics","authors":"D. Ahn, J. Vetter","doi":"10.1109/SC.2002.10066","DOIUrl":"https://doi.org/10.1109/SC.2002.10066","url":null,"abstract":"Contemporary microprocessors provide a rich set of integrated performance counters that allow application developers and system architects alike the opportunity to gather important information about workload behaviors. Current techniques for analyzing data produced from these counters use raw counts, ratios, and visualization techniques help users make decisions about their application performance. While these techniques are appropriate for analyzing data from one process, they do not scale easily to new levels demanded by contemporary computing systems. Very simply, this paper addresses these concerns by evaluating several multivariate statistical techniques on these datasets. We find that several techniques, such as statistical clustering, can automatically extract important features from the data. These derived results can, in turn, be fed directly back to an application developer, or used as input to a more comprehensive performance analysis environment, such as a visualization or an expert system.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122910396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}