{"title":"A system for monitoring and management of computational grids","authors":"Warren Smith","doi":"10.1109/ICPP.2002.1040859","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040859","url":null,"abstract":"As organizations begin to deploy large computational grids, it has become apparent that systems for observation and control of the resources, services, and applications that make up such grids are needed. Administrators must observe resources and services to ensure that they are operating correctly and must control resources and services to ensure that their operation meets the needs of users. Users are also interested in the operation of resources and services so that they can choose the most appropriate ones to use. We describe a prototype system to monitor and manage computational grids and describe the general software framework for control and observation in distributed environments that it is based on.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134358278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed game-tree search using transposition table driven work scheduling","authors":"Akihiro Kishimoto, J. Schaeffer","doi":"10.1109/ICPP.2002.1040888","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040888","url":null,"abstract":"The /spl alpha//spl beta/ algorithm for two-player game-tree search has a notorious reputation as being a challenging algorithm for achieving reasonable parallel performance. MTD(f), a new /spl alpha//spl beta/ variant, has become the sequential algorithm of choice for practitioners. Unfortunately, MTD(f) inherits most of the parallel obstacles of /spl alpha//spl beta/, as well as creating new performance hurdles. Transposition-table-driven scheduling (TDS) is a new parallel search algorithm that has proven to be effective in the single-agent (one-player) domain. This paper presents TDSAB, the first time TDS parallelism has been applied to two-player search (the MTD(f) algorithm). Results show that TDSAB gives comparable speedups to that achieved by conventional parallel /spl alpha//spl beta/ algorithms. However, since this is a parallelization of a superior sequential algorithm the results in fact are better. This paper shows that the TDS idea can be extended to more challenging search domains.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114129585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akihito Hiromori, H. Yamaguchi, K. Yasumoto, T. Higashino, K. Taniguchi
{"title":"A selection technique for replicated multicast video servers","authors":"Akihito Hiromori, H. Yamaguchi, K. Yasumoto, T. Higashino, K. Taniguchi","doi":"10.1109/ICPP.2002.1040913","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040913","url":null,"abstract":"In this paper, we propose a selection technique for replicated multicast video servers. We assume that each replicated video server transmits the same video source as different quality levels of multicast streams. Using an IGMP facility like m-trace, each receiver monitors packet count information of those streams on routers and periodically selects the one which is expected to provide low loss rate and to be suitable for the current available bandwidth of receivers. Moreover, collection of packet count information is done in a scalable and efficient manner by sharing the collected information across receivers. Our experimental results using the network simulator have shown that our method could achieve much higher quality satisfaction of receivers, under the reasonable amount of tracing traffic.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117071357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On-line permutation routing on WDM all-optical networks","authors":"Q. Gu","doi":"10.1109/ICPP.2002.1040898","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040898","url":null,"abstract":"For a sequence (s/sub 1/, t/sub 1/), ..., (s/sub i/, t/sub i/), ... of routing requests with (s/sub i/, t/sub i/) arriving at time step i on the wavelength-division multiplexing (WDM) all-optical network, the on-line routing problem is to set-up a path s/sub i/ /spl rarr/ t/sub i/ and assign a wavelength to the path in step i such that the paths set-up so far with the same wavelength are edge-disjoint. Two measures are important for on-line routing algorithms: the number of wavelengths used and the response time. The sequence (s/sub 1/,t/sub 1/), ..., (s/sub i/, t/sub i/), ... is called a permutation if each node in the network appears in the sequence at most once as a source and at most once as a destination. Let H/sub n/ be the n-dimensional WDM all-optical hypercube. We develop two on-line routing algorithms on H/sub n/. Our first algorithm is a deterministic one which realizes any permutation by at most /spl lceil/3(n-1)/2/spl rceil/ + 1 wavelengths with response time O(2/sup n/). The second algorithm is a randomized one which realizes any permutation by at most (3/2 + /spl delta/)(n-1) wavelengths, where /spl delta/ can be any value satisfying /spl delta/ /spl ges/ 2/(n-1). The average response time of the algorithm is O(n(1 + /spl delta/)//spl delta/). Both algorithms use at most O(n) wavelengths for the permutation on Hn. This improves the previous bound of O(n/sup 2/).","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121577771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Ferschweiler, S. Harrah, D. Keon, M. Calzarossa, D. Tessera, C. Pancake
{"title":"The tracefile testbed - a community repository for identifying and retrieving HPC performance data","authors":"K. Ferschweiler, S. Harrah, D. Keon, M. Calzarossa, D. Tessera, C. Pancake","doi":"10.1109/ICPP.2002.1040872","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040872","url":null,"abstract":"High-performance computing (HPC) programmers utilize tracefiles, which record program behavior in great detail, as the basis for many performance analysis activities. The lack of generally accessible tracefiles has forced programmers to develop their own testbeds in order to study the basic performance characteristics of the platforms they use. Since tracefiles serve as input to performance analysis and performance prediction tools, tool developers have also been hindered by the lack of a testbed for verifying and fine-tuning tool functionality, We created a community repository that meets the needs of both application and tool developers. In this paper, we describe how the tracefile testbed was designed to facilitate flexible searching and retrieval of tracefiles based on a variety of characteristics. Its Web-based interface provides a convenient mechanism for browsing, downloading, and uploading collections of tracefiles and tracefile segments, as well as viewing statistical summaries of performance characteristics.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131397248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Worst case analysis of a greedy multicast algorithm in k-ary n-cubes","authors":"S. Fujita","doi":"10.1109/ICPP.2002.1040908","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040908","url":null,"abstract":"In this paper, we consider the problem of multicasting a message in k-ary n-cubes under the store-and-forward model. The objective of the problem is to minimize the size of the resultant multicast tree by keeping the distance to each destination over the tree the same as the distance in the original graph. In the following, we first propose an algorithm that grows a multicast tree in a greedy manner, in the sense that for each intermediate vertex of the tree, the outgoing edges of the vertex are selected in a non-increasing order of the number of destinations that can use the edge in a shortest path to the destination. We then evaluate the goodness of the algorithm in terms of the worst case ratio of the size of the generated tree to the size of an optimal tree. It is proved that for any k/spl ges/5 and n/spl ges/6, the performance ratio of the greedy algorithm is c/spl times/kn-o(n) for some constant 1/1.2/spl les/c/spl les/1/2.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131563982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and evaluation of scalable switching fabrics for high-performance routers","authors":"N. Tzeng, Ravi C. Batchu","doi":"10.1109/ICPP.2002.1040871","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040871","url":null,"abstract":"This work considers switching fabrics with distributed packet routing to achieve high scalability and low costs. The considered switching fabrics are based on a multistage structure with different re-circulation designs, where adjacent stages are interconnected according to the indirect n-cube connection style. They all compare favorably with an earlier multistage-based counterpart according to extensive simulation, in terms of performance measures of interest and hardware complexity. When queues are incorporated in the output ports of switching elements (SEs), the total number of stages required in our proposed fabrics to reach a given performance level can be reduced substantially. The performance of those fabrics with output queues is evaluated under different \"speedups\" of the queues, where the speedup is the operating clock rate ratio of that at the SE core to that over external links. Our simulation reveals that a small speedup of 2 is adequate for buffered switching fabrics comprising 4/spl times/8 SEs to deliver better performance than their unbuffered counterparts with 50% more stages of SEs, when the fabric size is 256. The buffered switching fabrics under our consideration are scalable and of low costs, ideally suitable for constructing high-performance routers with large numbers of line cards.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127106497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software caching using dynamic binary rewriting for embedded devices","authors":"Chad Huneycutt, J. Fryman, K. Mackenzie","doi":"10.1109/ICPP.2002.1040920","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040920","url":null,"abstract":"A software cache implements instruction and data caching entirely in software. Dynamic binary rewriting offers a means to specialize the software cache miss checks at cache miss time. We describe a software cache system implemented using dynamic binary rewriting and observe that the combination is particularly appropriate for the scenario of a simple embedded system connected to a more powerful server over a network. As two examples, consider a network of sensors with local processing or cell phones connected to cell towers. We describe two software cache systems for instruction caching only using dynamic binary rewriting and present results for the performance of instruction caching in these systems. We measure time overheads of 19% compared to no caching. We also show that we can guarantee a 100% hit rate for codes that fit in the cache. For comparison, we estimate that a comparable hardware cache would have space overhead of 12-18% for its tag array and would offer no hit rate guarantee.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129300718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introducing SCSI-to-IP cache for storage area networks","authors":"Xubin He, Qing Yang, Ming Zhang","doi":"10.1109/ICPP.2002.1040875","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040875","url":null,"abstract":"Data storage plays an essential role in today's fast-growing data-intensive network services. iSCSI is one of the most recent standards that allow SCSI protocols to be carried out over IP networks. However, the disparities between SCSI and IP prevent fast and efficient deployment of SAN (storage area network) over IP. This paper introduces STICS (SCSI-To-IP cache storage), a novel storage architecture that couples reliable and high-speed data caching with low-overhead conversion between SCSI and IP protocols. Through the efficient caching algorithm and localization of certain unnecessary protocol overheads, STICS significantly improves performance over current iSCSI system. Furthermore, STICS can be used as a basic plug-and-play building block for data storage over IP. We have implemented software STICS prototype on Linux operating system. Numerical results using popular PostMark benchmark program and EMC's trace have shown dramatic performance gain over the current iSCSI implementation.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122366681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of memory hierarchy performance of block data layout","authors":"Neungsoo Park, Bo Hong, V. Prasanna","doi":"10.1109/ICPP.2002.1040857","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040857","url":null,"abstract":"Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. We provide a theoretical analysis for the TLB and cache performance of block data layout. For standard matrix access patterns, we derive an asymptotic lower bound on the number of TLB misses for any data layout and show that block data layout achieves this bound. We show that block data layout improves TLB misses by a factor of O(B) compared with conventional data layouts, where B is the block size of block data layout. This reduction contributes to the improvement in memory hierarchy performance. Using our TLB and cache analysis, we also discuss the impact of block size on the overall memory hierarchy performance. These results are validated through simulations and experiments on state-of-the-art platforms.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131956599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}