{"title":"A fine-grained parallel pipelined Karhunen-Loeve transform","authors":"M. Fleury, Bob Self, A. Downton","doi":"10.1109/IPDPS.2003.1213476","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213476","url":null,"abstract":"A high-performance Karhunen-Loeve transform for multi-spectral imagery suitable for remote-sensing applications has been prototyped on a platform FPGA, by means of a PC-based development board. Performance estimates suggest that the design will already outperform implementation on a high-end microprocessor, given due attention to I/O (input/output). General conclusions are reached for the utility of this architecture for fine-grained parallel processing, when the design is extended to massively parallel processing.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133648803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A programmable and highly pipelined PPP architecture for Gigabit IP over SDH/SONET","authors":"C. Toal, S. Sezer","doi":"10.1109/IPDPS.2003.1213331","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213331","url":null,"abstract":"This paper details the implementation of a highly pipelined 2.5 Gbit/s point-to-point-protocol packet processor (P/sup 5/) aimed at the latest system-on-a-programmable-chip (SoPC) technology. Throughput rates beyond 2.5 Gbit/s based on FPGA technology could be achieved by designing a new highly pipelined and parallel processing architecture for frames and datagrams. A novel pipelined data sorting mechanism with an extremely low resynchronization buffer and backpressure scheme are introduced to keep the data memory requirements as low as possible for embedded on-chip applications.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131913111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Almási, Leonardo R. Bachega, Ralph Bellofatto, J. Brunheroto, Calin Cascaval, J. Castaños, P. Crumley, C. Erway, J. Gagliano, D. Lieber, Pedro Mindlin, J. Moreira, R. Sahoo, A. Sanomiya, E. Schenfeld, R. Swetz, M. Bae, G. Laib, K. Ranganathan, Y. Aridor, T. Domany, Ya'akov Gal, O. Goldshmidt, Edi Shmueli
{"title":"System management in the BlueGene/L supercomputer","authors":"G. Almási, Leonardo R. Bachega, Ralph Bellofatto, J. Brunheroto, Calin Cascaval, J. Castaños, P. Crumley, C. Erway, J. Gagliano, D. Lieber, Pedro Mindlin, J. Moreira, R. Sahoo, A. Sanomiya, E. Schenfeld, R. Swetz, M. Bae, G. Laib, K. Ranganathan, Y. Aridor, T. Domany, Ya'akov Gal, O. Goldshmidt, Edi Shmueli","doi":"10.1109/IPDPS.2003.1213483","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213483","url":null,"abstract":"The BlueGene/L supercomputer will use system-on-a-chip integration and a highly scalable cellular architecture to deliver 360 teraflops of peak computing power. With 65536 compute nodes, BlueGene/L represents a new level of scalability for parallel systems. As such, it is natural for many scalability challenges to arise. In this paper, we discuss system management and control, including machine booting, software installation, user account management, system monitoring, and job execution. We address the issue of scalability by organizing the system hierarchically. The 65536 compute nodes are organized in 1024 clusters of 64 compute nodes each, called processing sets. Each processing set is under control of a 65th node, called an I/O node. The 1024 processing sets can then be managed to a great extent as a regular Linux cluster, of which there are several successful examples. Regular cluster management is complemented by BlueGene/L specific services, performed by a service node over a separate control network. Our software development and experiments have been conducted so far using an architecturally accurate simulator of BlueGene/L, and we are gearing up to test real prototypes in 2003.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134332041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A performance analysis of 4X InfiniBand data transfer operations","authors":"Ariel Cohen","doi":"10.1109/IPDPS.2003.1213372","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213372","url":null,"abstract":"The performance of 4X InfiniBand send/receive and RDMA operations is studied by running tests to measure latency, data rate, number of operations per second, and CPU load. The measurements performed are for application-to-application data transfers using user-level InfiniBand (IB) verbs. It is shown that IB is capable of low latencies (10 /spl mu/s for small messages) and very high data rates at low CPU loads (over 6 Gbs with 64 KB messages at under 20% CPU load). A very large number of operations per second (over 400,000) is obtained for small messages. Some comparisons are made with the performance of TCP/IP on Gigabit Ethernet. In addition, the paper studies the impact of varying the number of outstanding requests on the obtained throughput, and shows when the peak throughput can be obtained for messages of varying sizes. Finally, an approach for handling completions in user space without a busy wait and without the use of signals is introduced and CPU load results based on this approach are presented.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134074367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Campi, A. Cappelli, R. Guerrieri, Andrea Lodi, M. Toma, A. L. Rosa, L. Lavagno, C. Passerone, R. Canegallo
{"title":"A reconfigurable processor architecture and software development environment for embedded systems","authors":"F. Campi, A. Cappelli, R. Guerrieri, Andrea Lodi, M. Toma, A. L. Rosa, L. Lavagno, C. Passerone, R. Canegallo","doi":"10.1109/IPDPS.2003.1213314","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213314","url":null,"abstract":"Flexibility, high computing power and low energy consumption are strong guidelines when designing new generation embedded processors. Traditional architectures are no longer suitable to provide a good compromise among these contradictory implementation requirements. In this paper we present a new reconfigurable processor that tightly couples a VLIW architecture with a configurable unit implementing an additional configurable pipeline. A software development environment is also introduced providing a user-friendly tool for application development and performance simulation. Finally, we show that the HW/SW reconfigurable platform proposed achieves dramatic improvement in both speed and energy consumption on signal processing computation kernels.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114500157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmad A. Al-Yamani, S. M. Sait, H. Barada, H. Youssef
{"title":"Parallel tabu search in a heterogeneous environment","authors":"Ahmad A. Al-Yamani, S. M. Sait, H. Barada, H. Youssef","doi":"10.1109/IPDPS.2003.1213149","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213149","url":null,"abstract":"We discuss a parallel tabu search algorithm with implementation in a heterogeneous environment. Two parallelization strategies are integrated: functional decomposition and multi-search threads. In addition, domain decomposition strategy is implemented probabilistically. The performance of each strategy is observed and analyzed in terms of speeding up the search and finding better quality solutions. Experiments were conducted for the VLSI cell placement. The objective was to achieve the best possible solution in terms of interconnection length, timing performance, circuit speed, and area. The multiobjective nature of this problem is addressed using a fuzzy goal-based cost computation.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114514108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Management of peer-to-peer systems","authors":"M. Ciglarič, T. Vidmar","doi":"10.1109/IPDPS.2003.1213448","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213448","url":null,"abstract":"In the near future, peer-to-peer architecture is likely to enter into several new application areas, including e-commerce networking. The paper presents selected management-related issues within the area of peer-to peer systems: management of network load, overhead traffic, message routing, security and anonymity. Environment factors affecting managerial decisions are introduced and suggestions for domain identification are given, followed by a comprehensive list of appropriate policy rules. After that, the paper describes possible actions to be taken in cases when the conditions representing unwanted system behaviour are met.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125682708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconfigurable mapping functions for online architectures","authors":"Shyamnath Harinath, R. Sass","doi":"10.1109/IPDPS.2003.1213318","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213318","url":null,"abstract":"Content addressable memory is an expensive component in fixed architecture systems however it may prove to be a valuable tool in online architectures (that is, run-time reconfigurable systems with an online decision algorithm to determine the next reconfiguration). In this paper we define a related problem called an arbitrary mapping function and describe an online architecture. We look at four implementations of an arbitrary mapping function component and compare them in terms of space (number of CLB used), reconfiguration time, and component latency. All of the implementations offer low latency; which is the primary reason to use a content addressable memory or an arbitrary mapping function. Three of the implementations trade large size for very fast reconfiguration while the last implementation is extremely conservative in space but has a large reconfiguration time.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124945066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
María Blanca Caminero, C. Carrión, F. Quiles, J. Duato, S. Yalamanchili
{"title":"A solution for handling hybrid traffic in clustered environments: the MultiMedia Router MMR","authors":"María Blanca Caminero, C. Carrión, F. Quiles, J. Duato, S. Yalamanchili","doi":"10.1109/IPDPS.2003.1213362","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213362","url":null,"abstract":"The primary objective of the MultiMedia Router (MMR) project is the design and implementation of a compact router optimized for multimedia applications. The router is targeted for use in cluster and LAN interconnection networks, which offer different constraints and therefore differing router solutions than WANs. One of the key elements within the router are the algorithms used to decide the forwarding order of the information that traverses it: the link and switch scheduling algorithms. They help greatly to determine the QoS guarantees delivered to the application flows. Also, conventional best-effort traffic should be seamlessly integrated by scheduling algorithms, in such a way that link bandwidth is efficiently used, but without degrading the QoS guarantees of the multimedia connections. In this paper, two solutions for switch scheduling are thoroughly evaluated with mixed workloads (i.e., composed of multimedia and best-effort traffic), and their is performance compared to another well-known approach for switch scheduling, that does not consider QoS requirements when performing scheduling decisions. Results show that, when a QoS-aware switch scheduler is used, the QoS received by the multimedia flows is not affected by the presence of best-effort traffic.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134154214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valentin Puente, J. Gregorio, R. Beivide, F. Vallejo
{"title":"A low cost fault tolerant packet routing for parallel computers","authors":"Valentin Puente, J. Gregorio, R. Beivide, F. Vallejo","doi":"10.1109/IPDPS.2003.1213132","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213132","url":null,"abstract":"This paper presents a new switching mechanism to tolerate arbitrary faults in interconnection networks with a negligible implementation cost. Although our routing technique can be applied to any regular or irregular topology, in this paper we focus on its application to k-ary n-cube networks when managing both synthetic and real traffic workloads. Our mechanism is effective regardless the number of faults and their configuration. When the network is working without any fault, no overhead is added to the original routing scheme. In the presence of a low number of faults, the network sustains a performance close to that observed under fault-free conditions. Finally, when the number of faults increases, the system exhibits a graceful performance degradation.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134389430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}