{"title":"Energy-Efficient Task Scheduling in Manycore Processors with Frequency Scaling Overhead","authors":"Patrick Eitschberger, J. Keller","doi":"10.1109/PDP.2015.64","DOIUrl":"https://doi.org/10.1109/PDP.2015.64","url":null,"abstract":"We investigate deadline scheduling of independent tasks on parallel processors with discrete frequency levels, when the latency for frequency scaling cannot be neglected. This situation frequently occurs in applications, e.g. streaming applications with soft real-time requirements. We demonstrate that previous algorithms for energy-optimal static scheduling of independent tasks are non-optimal in this setting. We present a scheduling heuristic based on bin packing with a cost function that takes latency for frequency scaling into account. We evaluate our heuristic against previous approaches with benchmark task sets and achieve energy reductions between 3% and 13%. We further demonstrate that for a concrete embedded multicore processor, the power curves vary over the identical cores, so that the processor looks heterogeneous from a power perspective. We adapt our bin packing heuristic and demonstrate that for the benchmark task sets, further energy reductions up to 4% can be achieved.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"63 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116442662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Abadal, Albert Mestres, E. Alarcón, A. Cabellos-Aparicio, R. Martínez
{"title":"Multicast On-chip Traffic Analysis Targeting Manycore NoC Design","authors":"S. Abadal, Albert Mestres, E. Alarcón, A. Cabellos-Aparicio, R. Martínez","doi":"10.1109/PDP.2015.26","DOIUrl":"https://doi.org/10.1109/PDP.2015.26","url":null,"abstract":"The scalability of Network-on-Chip (NoC) designs has become a rising concern as we enter the many core era. Multicast support represents a particular yet relevant case within this context and has been the focus of different research efforts, mainly due to the poor performance of NoCs in the presence of this increasingly important type of traffic. However, most of the proposed schemes have been evaluated using synthetic traffic or within a full system, which is either unrealistic or costly. While traffic models would allow to better assess their performance, existing proposals do not distinguish between unicast and multicast flows and often are bound to a given number of cores. In this paper, a trace-based multicast traffic characterization is presented with the aim to provide guidelines for the modeling of multicast communications in many core settings. To this end, the scaling trends of aspects such as the multicast traffic intensity or the spatiotemporal injection distribution are analyzed. The novelty of this work resides both on its scalability-oriented approach and on the use of correlation metrics to evaluate potential prediction opportunities.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133035818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Extraction of Real-Time Parameters for Homogeneous Synchronous Dataflow Graphs","authors":"H. Ali, B. Akesson, L. M. Pinho","doi":"10.1109/PDP.2015.57","DOIUrl":"https://doi.org/10.1109/PDP.2015.57","url":null,"abstract":"Many embedded multi-core systems incorporate both dataflow applications with timing constraints and traditional real-time applications. Applying real-time scheduling techniques on such systems provides real-time guarantees that all running applications will execute safely without violating their deadlines. However, to apply traditional real-time scheduling techniques on such mixed systems, a unified model to represent both types of applications running on the system is required. Several earlier works have addressed this problem and solutions have been proposed that address acyclic graphs, implicit-deadline models or are able to extract timing parameters considering specific scheduling algorithms. In this paper, we present an algorithm for extracting real-time parameters (offsets, deadlines and periods) that are independent of the schedulability analysis, other applications running in the system, and the specific platform. The proposed algorithm: 1) enables applying traditional real-time schedulers and analysis techniques on cyclic or acyclic Homogeneous Synchronous Dataflow (HSDF) applications with periodic sources, 2) captures overlapping iterations, which is a main characteristic of the execution of dataflow applications, 3) provides a method to assign offsets and individual deadlines for HSDF actors, and 4) is compatible with widely used deadline assignment techniques, such as NORM and PURE. The paper proves the correctness of the proposed algorithm through formal proofs and examples.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116531509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Nanri, T. Soga, Yuichiro Ajima, Yoshiyuki Morie, H. Honda, Taizo Kobayashi, T. Takami, S. Sumimoto
{"title":"Channel Interface: A Primitive Model for Memory Efficient Communication","authors":"T. Nanri, T. Soga, Yuichiro Ajima, Yoshiyuki Morie, H. Honda, Taizo Kobayashi, T. Takami, S. Sumimoto","doi":"10.1109/PDP.2015.83","DOIUrl":"https://doi.org/10.1109/PDP.2015.83","url":null,"abstract":"Though the size of the system is getting larger towards exa-scale computation, the amount of available memory on computing nodes is expected to remain the same or to decrease. Therefore, memory efficiency is becoming an important issue for achieving scalability. This paper pointed out the problem of memory-inefficiency in the de-facto standard parallel programming model, Message Passing Interface (MPI). To solve this problem, the channel interface was introduced in the paper. This interface enables the programmers to appropriately allocate and de-allocate channels so that the program consumes just-enough amount of memory for communication. In addition to that, by limiting the message transfer supported by a channel as simple as possible, the memory consumption and the overhead for handling messages with this interface can be minimal. This paper showed a sample implementation of this interface. Then, the memory efficiency of the implementation is examined by the models of the memory consumption and the performance.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"178 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116638431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce","authors":"Ge Song, Justine Rochas, F. Huet, F. Magoulès","doi":"10.1109/PDP.2015.79","DOIUrl":"https://doi.org/10.1109/PDP.2015.79","url":null,"abstract":"Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for large scale data processing. Also, it can easily be executed in a distributed environment. Although these works provide different solutions to the same problem, each one has particular constraints and properties. There is no readily available comparison to help users choose the one most appropriate for their needs. This is the problem we address in this work. Firstly, we show that all kNN implementations go through a common workflow, which we use as a basis for classification. Secondly, we describe precisely the different techniques published so far. And lastly, we provide a set of objective criteria that can be used to make informed decisions.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"55 88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124776428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous Optimisation of Task Mapping and Priority Assignment for Real-Time Embedded NoCs","authors":"M. Sayuti, L. Indrusiak","doi":"10.1109/PDP.2015.84","DOIUrl":"https://doi.org/10.1109/PDP.2015.84","url":null,"abstract":"In a hard real-time embedded system based on a fixed priority pre-emptive Networks-On-Chip (NoC), the provision of guaranteed services may require pre-emption of some tasks and messages based on their priorities. In a worst case scenario, the interference imposed to low priority tasks can cause substantial computation and communication delays that can exceed their deadlines, leading to an unschedulable system. In a task mapping optimisation process, changing task mappings does not always produce a schedulable task mapping. In this paper, we propose an approach that simultaneously optimises task mapping and priority assignment, aiming to find a configuration that can completely satisfy the timing constraints of the system. Differing to the state-of-the-art, our approach takes into account the overall schedulability of the system by considering the worst-case end-to-end response time of all mapped tasks. As a result, we are able to increase the quality of task mappings at the same time improving the convergence of the optimisation algorithm, better than the previous approaches that solely focus on the task mapping optimisation to make the system schedulable.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125122297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Portable Framework for Real-Time Parallel Image Processing on High Performance Embedded Platforms","authors":"Clemens Eisserer","doi":"10.1109/PDP.2015.31","DOIUrl":"https://doi.org/10.1109/PDP.2015.31","url":null,"abstract":"The trend to efficient, however more complex, multicore designs has also reached the world of Digital Signal Processors (DSP), a field where typically low-level programming has been prevalent. To overcome the additional complexity of programming multi-core and multi-chip DSP systems, we present an object-oriented framework for task-based parallel programming on the highly power efficient Texas Instruments TSMC320C6678 platform. Our framework incorporates hardware architectural details of this platform such as DMA units in a high-level manner, while maintaining portability - guiding the path for algorithmic designers from PCs to embedded DSP platforms. The whole framework has been designed and implemented with real-time requirements and low overhead in mind, which is crucial for the acceptance of higher-level solutions on embedded systems.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125184181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concentration and Its Impact on Mesh and Torus-Based NoC Performance","authors":"S. Loucif","doi":"10.1109/PDP.2015.35","DOIUrl":"https://doi.org/10.1109/PDP.2015.35","url":null,"abstract":"This paper investigates the effects of concentration on the performance of k-ary n-cubes. Simulation results indicate that only large ratios of packet length-to-average hop-count are in favor of concentrated mesh and torus. The Cmesh takes full advantage of its high channel bandwidth to outperform Ctorus. Moreover, non-local traffic suffers more from performance bottleneck than local traffic at routers. Providing dedicated input ports, one for each IP, at routers, reduces the average packet latency compared to a configuration with a single input port shared by all IP cores of the cluster.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115258974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Przemyslaw Blaskiewicz, M. Zawada, P. Balcerek, P. Dawidowski
{"title":"An Application of GPU Parallel Computing to Power Flow Calculation in HVDC Networks","authors":"Przemyslaw Blaskiewicz, M. Zawada, P. Balcerek, P. Dawidowski","doi":"10.1109/PDP.2015.110","DOIUrl":"https://doi.org/10.1109/PDP.2015.110","url":null,"abstract":"Numerical computation on GPU has become easily accessible and offers good computation power for relatively little cost. Recently an application of Newton-Rap son method for analyzing power flow in multi-terminal high-voltage direct current (HVDC) networks was proposed and shown to have good results on five terminal grids. Since this method involves costly matrix operation, especially the inverse, increasing the number of terminals in the grid yields prohibitively large execution times in sequential operation. To address this issue, we adjust the algorithm so that it benefits from parallel computation and test our approach on recent GPU from NVidia. We give experimental results for grids up to few thousand terminals and show that execution time is still acceptable for real applications. We also provide some benchmarks of the GPU computation compared with other platforms.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115362605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Green Perspective on Structured Parallel Programming","authors":"M. Danelutto, M. Torquati, P. Kilpatrick","doi":"10.1109/PDP.2015.116","DOIUrl":"https://doi.org/10.1109/PDP.2015.116","url":null,"abstract":"Structured parallel programming, and in particular programming models using the algorithmic skeleton or parallel design pattern concepts, are increasingly considered to be the only viable means of supporting effective development of scalable and efficient parallel programs. Structured parallel programming models have been assessed in a number of works in the context of performance. In this paper we consider how the use of structured parallel programming models allows knowledge of the parallel patterns present to be harnessed to address both performance and energy consumption. We consider different features of structured parallel programming that may be leveraged to impact the performance/energy trade-off and we discuss a preliminary set of experiments validating our claims.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124611520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}