{"title":"Discrete Min-Energy Scheduling on Restricted Parallel Processors","authors":"Xibo Jin, Fa Zhang, Zhiyong Liu","doi":"10.1109/IPDPSW.2013.43","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.43","url":null,"abstract":"Different from the previous work on energy-efficient algorithms, which focused on assumption that a task can be assigned to any processor, we study the problem of task Scheduling with the objective of Energy Minimization on Restricted Parallel Processors (SEMRPP). Restriction accounts for affinities between tasks and processors, that is, a task has its own eligible processing set of processors. It assumes all tasks have a prescribed deadline on the execution time. We study the processors run at a finite number of distinct speeds, and the processors cannot change its speed during the computation of a task. Our work is motivated by the practical variable voltage processors that they cannot run at arbitrary speed and the task may be failure if the processor adjusts its speed during the computation of the task. We assess the complexity of the problem and present a polynomial time approximation algorithm with a bounded factor related to the adjacent speed ratio.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133110119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. C. G. N. Ewo, Emmanuel Kiegaing, M. Mbouenda, H. Fotsin, B. Granado
{"title":"Hardware MPI-2 Functions for Multi-Processing Reconfigurable System on Chip","authors":"R. C. G. N. Ewo, Emmanuel Kiegaing, M. Mbouenda, H. Fotsin, B. Granado","doi":"10.1109/IPDPSW.2013.147","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.147","url":null,"abstract":"In this paper we describe a hardware implementation, of the MPI-2 RMA communication library primitive, devoted to a distributed Multi Processing Reconfigurable System on Chip (MP-RSoC). We designed a platform able to process communications over a custom heterogeneous MP-RSoC using our hardware MPI-2 RMA communication primitives. To implement these primitives, we have conceived a scalable Network on Chip based on a crossbar. MPI-2 RMA primitives are directly usable in hardware tasks in the MP-RSoC to transfer data between all resources, either hardware or software. We also show that using message passing for parallel programming can have benefits in term of scalability and heterogeneity. Our hardware primitives have been implemented and tested on Xilinx FPGA spartan6 board.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133181811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Rossignon, P. Hénon, Olivier Aumage, Samuel Thibault
{"title":"A NUMA-Aware Fine Grain Parallelization Framework for Multi-core Architecture","authors":"C. Rossignon, P. Hénon, Olivier Aumage, Samuel Thibault","doi":"10.1109/IPDPSW.2013.204","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.204","url":null,"abstract":"We present some solutions to handle two problems commonly encountered when dealing with fine grain parallelization on multi-core architecture: Expressing algorithms using a task grain size suitable for the hardware and minimizing the time penalty due to Non Uniform Memory Accesses. To evaluate the benefit of our work we present some experiments on the fine grain parallelization of an iterative solver for sparse linear systems with some comparisons with the Intel TBB approach.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133288773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"InfoStor: Highly Available Distributed Block Store","authors":"Yongjian Ren, YouQing Lin, Jilin Zhang, Jian Wan, Congfeng Jiang","doi":"10.1109/IPDPSW.2013.41","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.41","url":null,"abstract":"In order to adapt to the requirements of the massive scale storage environments, and improve storage space utilization of the data center host, we designed and implemented InfoStor, a heterogeneous environment, distributed block storage system. Through in-band storage virtualization technology that provides the reliability of traditional enterprise arrays with low cost and better scalability; provide a copy of redundancy and back-end support the adaptive copy of reconstruction; Using consistent hashing algorithm based on virtual node to manage replica of storage backend to improve the load balance and performance of the distributed block storage system. The evaluation experiments demonstrate that compare to ordinary iSCSI network hard disk, InfoStor reach considerable data processing capabilities I/O throughput, it is foreseeable that system performance did not impact by storage virtualization and distributed storage backend.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128923016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcin Kardas, M. Klonowski, Kamil Wolny, Dominik Pajak
{"title":"K-Selection Protocols from Energetic Complexity Perspective","authors":"Marcin Kardas, M. Klonowski, Kamil Wolny, Dominik Pajak","doi":"10.1109/IPDPSW.2013.80","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.80","url":null,"abstract":"In this paper we discuss energetic complexity aspects of k-Selection protocols for the single-hop radio network (that is equivalent to Multiple Access Channel model). The aim is to grant each of k activated stations exclusive access to the communication channel. We consider both deterministic as well as randomized model. Our main goal is to investigate relations between minimal time of execution (time complexity) and energy consumption (energetic complexity). We present lower bound for energetic complexity for some classes of protocols for k-Selection. We also present randomized protocol efficient in terms of both time and energetic complexity.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116042446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting a Pattern for Processing Combinatorial Objects in Parallel","authors":"C. Trefftz, J. Scripps","doi":"10.1109/IPDPSW.2013.123","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.123","url":null,"abstract":"The unrank pattern to process combinatorial objects in parallel is revisited. The pattern is applied to find, in parallel, solutions to a restricted version of the community finding problem on small graphs. Performance results obtained on a shared memory machine, a cluster of workstations and a Graphical Processing Unit (GPU) are included.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116353993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Compiler Analysis to Determine Useful Cache Size for Energy Efficiency","authors":"Sanket Tavarageri, P. Sadayappan","doi":"10.1109/IPDPSW.2013.268","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.268","url":null,"abstract":"As processor and memory system speeds have significantly diverged, system designers have introduced ever larger caches in an effort to supply the processor with data at a rate it is capable of processing it. However, application characteristics vary and not all programs can effectively utilize large caches due to their inherent data reuse properties. The inability to use all the available cache capacity leads to wasted cache power dissipation. The rising specter of \"dark silicon\" makes it critical to avoid wasted power on a chip.In this paper, we develop a compile-time approach to analyze data reuse characteristics of affine computations and deduce the useful cache size(s) for a given system configuration. The non-useful cache can be power-gated to save power. Analysis of benchmarks shows that significant fractions of the last level cache of current processors may be turned off with no performance loss.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117029053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequential and Parallel Restart Policies for Constraint-Based Local Search","authors":"Y. Caniou, P. Codognet","doi":"10.1109/IPDPSW.2013.211","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.211","url":null,"abstract":"We study in this paper the influence of the restart policy on the sequential and parallel performance of combinatorial search problems. Our evaluation relies on several experiments using a constraint-based local search method, named Adaptive Search, and a few combinatorial problems such as Magic Square and Costas Array Problems.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116862388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tayeb Bouhadiba, M. Moy, F. Maraninchi, J. Cornet, L. Maillet-Contoz, Ilija Materic
{"title":"Co-simulation of Functional SystemC TLM Models with Power/Thermal Solvers","authors":"Tayeb Bouhadiba, M. Moy, F. Maraninchi, J. Cornet, L. Maillet-Contoz, Ilija Materic","doi":"10.1109/IPDPSW.2013.206","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.206","url":null,"abstract":"Modern systems-on-chips need sophisticated power-Management policies to control their power consumption and temperature. These power-management policies are usually implemented partly in software, with hardware support. They need to be validated early, hence power and temperature-aware simulation techniques at the system-level need to be developed. Existing approaches for system-level power and thermal analysis usually either completely abstract the functionality (allowing only simple scenarios to be simulated), or run the functional simulation independently from the non-functional one. The approach presented in this paper allows a coupled simulation of a SystemC/TLM model, possibly including the actual embedded software, with a power and temperature solver such as ATMI or the commercial tool ACEplorer. Power and temperature analysis is done based on the stimuli sent by the SystemC/TLM platform, which in turn can take decisions based on the non-functional simulation.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115191683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introducing Parallel Programming in Undergraduate Curriculum","authors":"Cordelia M. Brown, Yung-Hsiang Lu, S. Midkiff","doi":"10.1109/IPDPSW.2013.270","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.270","url":null,"abstract":"This paper summarizes our experiences and findings in teaching the concepts of parallel computing in two undergraduate programming courses and an undergraduate hardware design course. The first is a junior-senior level elective course Object-Oriented Programming using C++ and Java. The second is a sophomore-level required course on Advanced C Programming. The third course, Introduction to Digital System Design, is also a sophomore-level required course. We will describe how parallel concepts have been integrated in the courses, the assessments, and the results.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115311009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}