{"title":"Component labeling for k-concave binary images using an FPGA","authors":"Yasuaki Ito, K. Nakano","doi":"10.1109/IPDPS.2008.4536129","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536129","url":null,"abstract":"Connected component labeling is a task that assigns unique IDs to the connected components of a binary image. The main contribution of this paper is to present a hardware connected component labeling algorithm for k-concave binary images designed and implemented in FPGA. Pixels of a binary image are given to the FPGA in raster order, and the resulting labels are also output in the same order. The advantage of our labeling algorithm is small latency and to use a small internal storage of the FPGA. We have implemented our hardware labeling algorithm in an Altera Stratix Family FPGA, and evaluated the performance. The implementation result shows that for a 10-concave binary image of 2048 times 2048, our connected component labeling algorithm runs in approximately 70 ms and its latency is approximately 750 ns.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126781828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and implementation of a tool for modeling and programming deadlock free meta-pipeline applications","authors":"S. Yamagiwa, L. Sousa","doi":"10.1109/IPDPS.2008.4536121","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536121","url":null,"abstract":"The Caravela platform has been designed to develop a parallel and distributed stream-based computing paradigm, namely supported on the pipeline processing approach herein designated by meta-pipeline. This paper is focused on the design and implementation of a modeling tool for the meta-pipeline, namely to tackle the deadlock problem due to uninitialized input data stream in a pipeline-model. A new efficient algorithm is proposed to prevent deadlock situations by detecting uninitialized edges in a pipeline graph. The algorithm identifies the cyclic paths in a pipeline-graph and builds a reduced list with only the true cyclic paths that have to be really initialized. Further optimization techniques are also proposed to reduce the computation time and the required amount of memory. Moreover, this paper also presents a Graphical User Interface (GUI) for easy programming meta-pipeline applications, which provides an automatic validation procedure based on the proposed algorithm. Experimental results presented in this paper show the effectiveness of both the proposed algorithm and the developed GUI.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126853190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gaurav Khanna, Ümit V. Çatalyürek, T. Kurç, R. Kettimuthu, P. Sadayappan, J. Saltz
{"title":"A dynamic scheduling approach for coordinated wide-area data transfers using GridFTP","authors":"Gaurav Khanna, Ümit V. Çatalyürek, T. Kurç, R. Kettimuthu, P. Sadayappan, J. Saltz","doi":"10.1109/IPDPS.2008.4536325","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536325","url":null,"abstract":"Many scientific applications need to stage large volumes of files from one set of machines to another set of machines in a wide-area network. Efficient execution of such data transfers needs to take into account the heterogeneous nature of the environment and dynamic availability of shared resources. This paper proposes an algorithm that dynamically schedules a batch of data transfer requests with the goal of minimizing the overall transfer time. The proposed algorithm performs simultaneous transfer of chunks of files from multiple file replicas, if the replicas exist. Adaptive replica selection is employed to transfer different chunks of the same file by taking dynamically changing network band- widths into account. We utilize GridFTP as the underlying mechanism for data transfers. The algorithm makes use of information from past GridFTP transfers to estimate network bandwidths and resource availability. The efficiency of the algorithm is evaluated on a wide-area testbed.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123414333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Becker, F. Henrici, S. Trendelenburg, Y. Manoli
{"title":"A rapid prototyping environment for high-speed reconfigurable analog signal processing","authors":"J. Becker, F. Henrici, S. Trendelenburg, Y. Manoli","doi":"10.1109/IPDPS.2008.4536511","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536511","url":null,"abstract":"This paper reports on a rapid-prototyping platform for high-frequency continuous-time analog filters to be used in communication front-ends. A field programmable analog array (FPAA) is presented, which implements a unique hexagonal topology of 55 tunable OTAs for reconfigurable instantiation of Gm-C filters in a 0.13 mum CMOS technology. It is the first analog array to achieve a bandwidth, which allows processing of intermediate frequencies used in communication systems. In addition to the intuitive manual mapping of analog filters to the chip structure, a genetic algorithm with hardware in the loop is used for automated synthesis of transfer functions.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126474001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Göhringer, M. Hübner, Laure Hugot-Derville, J. Becker
{"title":"Runtime adaptive multi-processor system-on-chip: RAMPSoC","authors":"D. Göhringer, M. Hübner, Laure Hugot-Derville, J. Becker","doi":"10.1109/ICSAMOS.2010.5642043","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2010.5642043","url":null,"abstract":"Current trends in high performance computing show, that the usage of multiprocessor systems on chip are one approach for the requirements of computing intensive applications. The multiprocessor system on chip (MPSoC) approaches often provide a static and homogeneous infrastructure of networked microprocessor on the chip die. A novel idea in this research area is to introduce the dynamic adaptivity of reconfigurable hardware in order to provide a flexible heterogeneous set of processing elements during run-time. This extension of the MPSoC idea by introducing run-time reconfiguration delivers a new degree of freedom for system design as well as for the optimized distribution of computing tasks to the adapted processing cells on the architecture related to the changing application requirements. The \"computing in time and space\"paradigm and the extension with the new degree of freedom for MPSoCs will be presented with the RAMPSoC approach described in this paper.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126622838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low power/area branch prediction using complementary branch predictors","authors":"Resit Sendag, J. Yi, Peng-fei Chuang, D. Lilja","doi":"10.1109/IPDPS.2008.4536323","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536323","url":null,"abstract":"Although high branch prediction accuracy is necessary for high performance, it typically comes at the cost of larger predictor tables and/or more complex prediction algorithms. Unfortunately, large predictor tables and complex algorithms require more chip area and have higher power consumption, which precludes their use in embedded processors. As an alternative to large, complex branch predictors, in this paper, we investigate adding complementary branch predictors (CBP) to embedded processors to reduce their power consumption and/or improve their branch prediction accuracy. A CBP differs from a conventional branch predictor in that it focuses only on frequently mispredicted branches while letting the conventional branch predictor predict the more predictable ones. Our results show that adding a small 16-entry (28 byte) CBP reduces the branch misprediction rate of static, bimodal, and gshare branch predictors by an average of 51.0%, 42.5%, and 39.8%, respectively, across 38 SPEC 2000 and MiBench benchmarks. Furthermore, a 256-entry CBP improves the energy-efficiency of the branch predictor and processor up to 97.8% and 23.6%, respectively. Finally, in addition to being very energy-efficient, a CBP can also improve the processor performance and, due to its simplicity, can be easily added to the pipeline of any processor.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"57 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114124054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A neocortex model implementation on reconfigurable logic with streaming memory","authors":"Christopher N. Vutsinas, T. Taha, Kenneth L. Rice","doi":"10.1109/IPDPS.2008.4536533","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536533","url":null,"abstract":"In this paper we study the acceleration of a new class of cognitive processing applications based on the structure of the neocortex. Our focus is on a model of the visual cortex used for image recognition developed by George and Hawkins. We propose techniques to accelerate the algorithm using reconfigurable logic, specifically a streaming memory architecture utilizing available off-chip memory. We discuss the design of a streaming memory access unit enabling a large number of processing elements to be placed on a single FPGA thus increasing throughput. We present an implementation of our approach on a Cray XD1 and discuss possible extension to further increase throughput. Our results indicate that using a two FPGA design with streaming memory gives a speedup of 71.9 times over a purely software implementation.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114131463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive B-Greedy (ABG): A simple yet efficient scheduling algorithm","authors":"Hongyang Sun, W. Hsu","doi":"10.1109/IPDPS.2008.4536546","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536546","url":null,"abstract":"In order to improve processor utilizations on parallel systems, adaptive scheduling with parallelism feedback was recently proposed. A-Greedy, an existing adaptive scheduler, offers provably-good job execution time and processor utilization. Unfortunately, it suffers from unstable feedback and hence unnecessary processor reallocations even when the job has constant parallelism. This problem may cause difficulties in the management of system resources. We propose a new adaptive scheduler called ABG (for Adaptive B-Greedy), which ensures both performance and stability. In a direct comparison with A-Greedy using simulated data- parallel jobs, ABG shows an average 50% reduction in wasted processor cycles and an average 20% improvement in running time. For a set of jobs, ABG also outperforms A-Greedy by 10% to 15% on average in terms of both makespan and mean response time, provided the system is not heavily loaded. Our detailed analysis shows that ABG indeed offers improved transient and steady-state behaviors in terms of control-theoretic metrics. Using trim analysis, we show that ABG provides nearly linear speedup for individual jobs and good processor utilizations. Using competitive analysis, we also show that ABG offers good makespan and mean response time bounds.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125212083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defining a simple metric for real-time security level evaluation of multi-sites networks","authors":"Abdoul Karim Ganame, J. Bourgeois","doi":"10.1109/IPDPS.2008.4536562","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536562","url":null,"abstract":"In previous research work, we have developed a centralized security operation center (SOC) [2] and a distributed SOC [4]. These environments are very useful to react to intrusions or to analyze security problem because they provide a global view of the network without adding any kinds of software on network components. They therefore lack the possibility to have a real-time metric which measures the security health of the different sites. The idea is to have, in one look, an indication of the security level of all the sites of the network. In this article, we propose to define such a metric which gives the user 3 states for a given network.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122465708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallel insular model for location areas planning in mobile networks","authors":"Laidi Foughali, E. Talbi, M. Batouche","doi":"10.1109/IPDPS.2008.4536367","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536367","url":null,"abstract":"The main interest of this paper is the optimization of the location areas planning in cellular radio networks. It is well known that the quality of service in mobile networks depends on many parameters, among them an optimal location area planning. Furthermore, it is more interesting to provide a logical organization for the already deployed networks. In this paper, we propose the use of heuristics strategies and hybrid metaheuristics strategies to solve the location areas planning problem. The latter is formulated as a constrained planar graph partitioning problem by using a mathematical model which is based on a very realistic specification. Heuristics strategies are based on greedy algorithms while hybrid metaheuristics are based on genetic algorithms. New genetic operators have been designed to this specific problem. Moreover, parallel approaches have been proposed to improve the quality of solutions and speedup the search. Results obtained on real-life benchmarks show the effectiveness of the developed optimization algorithms.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122785928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}