Tao Li, Zhentao Liu, Huimin Du, Lei Zhang, Jungang Han, Lin Jiang, Qingang Dong
{"title":"Reconfigurable Designs for Networking Silicon","authors":"Tao Li, Zhentao Liu, Huimin Du, Lei Zhang, Jungang Han, Lin Jiang, Qingang Dong","doi":"10.1109/IPDPSW.2012.35","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.35","url":null,"abstract":"This paper presents a reconfigurable architecture and associated design methodology for developing networking silicon chips. The tools include most of the common traffic QoS features and low level interfaces as well as some special features for extensible design. When coupled with the design tools, this architecture provides powerful capabilities for the design of highly flexible networking silicon IP cores.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114292545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Ochoa-Ruiz, Ouassila Labbani, E. Bourennane, Philippe Soulard
{"title":"Model-Driven Approach for Automatic Dynamic Partially Reconfigurable IP Customization","authors":"G. Ochoa-Ruiz, Ouassila Labbani, E. Bourennane, Philippe Soulard","doi":"10.1109/IPDPSW.2012.51","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.51","url":null,"abstract":"This paper presents a framework which automates the generation of DPR capable IP cores. The approach is based in an MDE methodology, which exploits two widely used standards for Systems-on-Chip specification, UML/MARTE and IP-XACT. The approach aims at generating IPs which incorporate different functionalities by using code templates. The templates correspond to IP-XACT components that represent VHDL modules to be implemented in the IP. The IP-XACT sub-system description is generated from the MARTE description, effectively diminishing the complexity of creating this kind of systems by increasing the level of abstraction. We present the MARTE modeling concepts and how these models are mapped to IP-XACT objects, the emphasis is given to the generation of IP cores that can be used in the Xilinx EDK environment, since we aim to develop a complete flow around their Dynamic Partial Reconfiguration design flow. A model for the DPR IP is presented and a case study for a simple IP is presented. The use of our MDE approach is introduced to demonstrate how the generation from MARTE to EDK systems is performed.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117274305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing the Execution of Statistical Simulations for Human Evolution in Hyper-threaded Multicore Architectures","authors":"R. Dias, C. Rose, A. A. Gomes, N. J. Fagundes","doi":"10.1109/IPDPSW.2012.87","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.87","url":null,"abstract":"Simulations of statistical models have been used to validate theories of past events in evolution of species. Studies concerning human evolution are important for understanding about our history and biodiversity. However, these approaches use complex statistical models, leading to high computational cost. The present paper proposes optimization techniques for Hyper-threaded multicore architectures to improve the computational performance of these simulations. Combining granularity studies and Hyper-threading optimization, we improved the performance of simulations in more than 30%, if compared with common parallel execution (default parallelization applied by users). The performance was evaluated using a complex example of human evolution studies [1]. For this example, our techniques enable the user to decrease the simulation execution time from 50 days (sequential runtime) to less than 5 days. In addition, the evaluation has been extended for simulations running on multiple multicore cluster nodes. Our measurements show a high Speed-up, close to theoretical maximum, being 129 times faster for 160 computational cores. This represents an efficiency of 81%.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129431432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Analysis of Multicore Specific Optimization in MPI Implementations","authors":"Pengqi Cheng, Yan Gu","doi":"10.1109/IPDPSW.2012.231","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.231","url":null,"abstract":"We first introduced the multicore specific optimization modules of two common MPI implementations â\" MPICH2 and Open MPI, and then tested their performance on one multicore computer. By enabling and disabling these modules, we provided their performance, including bandwidth and latency, under different circumstances. Finally, we analyzed the two MPI implementations and discussed the choice of MPI implementations and possible improvements.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130624340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MTSD: A Task Scheduling Algorithm for MapReduce Base on Deadline Constraints","authors":"Zhuo Tang, Junqing Zhou, Kenli Li, Ruixuan Li","doi":"10.1109/IPDPSW.2012.250","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.250","url":null,"abstract":"The previous works about MapReduce task scheduling with deadline constraints neither take the diffenences of Map and Reduce task, nor the cluster's heterogeneity into account. This paper proposes an extensional MapReduce Task Scheduling algorithm for Deadline constraints in Hadoop platform: MTSD. It allows user specify a job's deadline and tries to make the job be finished before the deadline. Through measuring the node's computing capacity, a node classification algorithm is proposed in MTSD. This algorithm classifies the nodes into several levels in heterogeneous clusters. Under this algorithm, we firstly illuminate a novel data distribution model which distributes data according to the node's capacity level respectively. The experiments show that the data locality is improved about 57%. Secondly, we calculate the task's average completion time which is based on the node level. It improves the precision of task's remaining time evaluation. Finally, MTSD provides a mechanism to decide which job's task should be scheduled by calculating the Map and Reduce task slot requirements.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123885793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Different Approaches to Distributed Compilation","authors":"J. Gattermayer, P. Tvrdík","doi":"10.1109/IPDPSW.2012.137","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.137","url":null,"abstract":"Source code compiling is a non-trivial task that requires many computing resources. As a software project grows, its build time increases and debugging on a single computer becomes more and more time consuming task. An obvious solution would be a dedicated cluster acting as a build farm, where developers can send their requests. But in most cases, this solution has a very low utilization of available computing resources which makes it very ineffective. Therefore, we have focused on non-dedicated clusters to perform distributed compilation, where we could use users' computers as nodes of a build farm. We compare two different approaches: distcc, which is an open-source program to distribute compilation of C/C++ code between several computers on a network and Clondike, which is a universal peer-to-peer cluster that is being developed at the Czech Technical University in Prague. A very complex task able to test deeply both systems is a compilation of a Linux Kernel with many config options. We have run this task on a cluster with up to 20 computers and have measured computing times and CPU loads. In this paper, we will present the results of this experiment that indicate the scalability and utilization of given resources in both systems. We also discuss the penalty of a generic solution over a task-specific one.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123315842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongwei Huo, Shuai Lin, Qiang Yu, Yipu Zhang, V. Stojkovic
{"title":"A MapReduce-based Algorithm for Motif Search","authors":"Hongwei Huo, Shuai Lin, Qiang Yu, Yipu Zhang, V. Stojkovic","doi":"10.1109/IPDPSW.2012.255","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.255","url":null,"abstract":"Motif search plays an important role in gene finding and understanding gene regulation relationship. Motif search is one of the most challenging problems in bioinformatics. In this paper, we present three data partitions for the PMSP algorithm and propose the PMSP MapReduce algorithm (PMSPMR) for solving the motif search problem. For instances of the problem with different difficulties, the experimental results on the Hadoop cluster demonstrate that PMSPMR has good scalability. In particular, for the more difficult motif search problems, PMSPMR shows its advantage because the speedup is almost linearly proportional to the number of nodes in the Hadoop cluster. We also present experimental results on realistic biological data by identifying known transcriptional regulatory motifs in eukaryotes as well as in actual promoter sequences extracted from Saccharomyces cerevisiae.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121141120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Area-Efficient FPGA Implementation of Quadruple Precision Floating Point Multiplier","authors":"M. Jaiswal, R. Cheung","doi":"10.1109/IPDPSW.2012.46","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.46","url":null,"abstract":"Floating point multiplication is a crucial and useful arithmetic operation for many scientific and signal processing applications. High precision requirements of many applications lead to the incorporation of quadruple precision (QP) arithmetics. The logic complexity and performance overhead of quadruple precision arithmetic are quite large. This paper has focused on one of the quadruple precision arithmetic operations, multiplication. We present an efficient implementation of QP multiplication operation on a reconfigurable FPGA platform. The presented design uses much less hardware resource in terms of DSP48 blocks, and slices with a higher performance. Promising results are obtained by comparing the proposed designs with the best reported QP floating point multiplier in the literature. We have achieved more than 50% improvements in the amount of DSP48 block at a slight cost of additional slices, on a Virtex-4 FPGA.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121349102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A System for GIS Polygonal Overlay Computation on Linux Cluster - An Experience and Performance Report","authors":"Dinesh Agarwal, S. Puri, Xi He, S. Prasad","doi":"10.1109/IPDPSW.2012.180","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.180","url":null,"abstract":"GIS polygon-based (also know as vector-based) spatial data overlay computation is much more complex than raster data computation. Processing of polygonal spatial data files has been a long standing research question in GIS community due to the irregular and data intensive nature of the underlying computation. The state-of-the-art software for overlay computation in GIS community is still desktop-based. We present a cluster-based distributed solution for end-to-end polygon overlay processing, modeled after our Windows Azure cloud-based Crayons system [1]. We present the details of porting Crayons system to MPI-based Linux cluster and show the improvements made by employing efficient data structures such as R-trees. We present performance report and show the scalability of our system, along with the remaining bottlenecks. Our experimental results show an absolute speedup of 15x for end-to-end overlay computation employing up to 80 cores.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116502001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Placement Strategy of Virtual Machines Based on Workload Characteristics","authors":"Jian Wan, Fei Pan, Congfeng Jiang","doi":"10.1109/IPDPSW.2012.264","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.264","url":null,"abstract":"Traditional Virtual Machines are over-provisioned to provide peak performance and waste a lot of system resources. In this paper, we propose and implement a placement strategy of Virtual Machines based on workload characteristic-s. In our approach, the virtual machines are placed into various groups after several iterations and matching based on the complementary of virtual machines' workloads. Requested resources are allocated to virtual machines placed in the same group according to the sum of individual resource requests. The experiment results show that the resource utilization of our approach was increased by 37.5% compared to traditional placement approaches, and it was increased by 12.5% compared with non-iterative matching approach. And we can conclude that our approach uses fewer physical machines and provides acceptable application performance.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121569259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}