Simon Holmbacka, J. Keller, Patrick Eitschberger, J. Lilius
{"title":"Accurate Energy Modelling for Many-Core Static Schedules","authors":"Simon Holmbacka, J. Keller, Patrick Eitschberger, J. Lilius","doi":"10.1109/PDP.2015.27","DOIUrl":"https://doi.org/10.1109/PDP.2015.27","url":null,"abstract":"Static schedules can be a preferable alternative for applications with timing requirements and predictable behavior since the processing resources can be more precisely allocated for the given workload. Unused resources are handled by power management systems to either scale down or shut off parts of the chip to save energy. In order to efficiently implement power management, especially in many-core systems, an accurate model is important in order to make the appropriate power management decisions at the right time. For making correct decisions, practical issues such as latency for controlling the power saving techniques should be considered when deriving the system model, especially for fine timing granularity. In this paper we present an accurate energy model for many-core systems which includes switching latency of modern power saving techniques. The model is used when calculating an optimal static schedule for many-core task execution on systems with dynamic frequency levels and sleep state mechanisms. We create the model parameters for an embedded processor, and we validate it in practice with synthetic benchmarks on real hardware.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"600 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116289715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Distributed Code Generation from Formal Models of Asynchronous Concurrent Processes","authors":"Hugues Evrard, Frédéric Lang","doi":"10.1109/PDP.2015.96","DOIUrl":"https://doi.org/10.1109/PDP.2015.96","url":null,"abstract":"Formal process languages inheriting the concurrency and communication features of process algebras are convenient formalisms to model distributed applications, especially when they are equipped with formal verification tools (e.g., model-checkers) to help hunting for bugs early in the development process. However, even starting from a fully verified formal model, bugs are likely to be introduced while translating (generally by hand) the concurrent model -- which relies on high-level and expressive communication primitives -- into the distributed implementation -- which often relies on low-level communication primitives. In this paper, we present DLC, a compiler that enables distributed code to be generated from models written in a formal process language called LNT, which is equipped with a rich verification toolbox named CADP. The generated code can be either executed in an autonomous way (i.e., without requiring additional code to be defined by the user), or connected to external software through user-modifiable C functions. We present an experiment where DLC generates a distributed implementation from the LNT model of the Raft consensus algorithm.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127591611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Implementation of a Fast Viewshed Algorithm on SIMD Architectures","authors":"J. C. Bravo, T. Sarjakoski, J. Westerholm","doi":"10.1109/PDP.2015.62","DOIUrl":"https://doi.org/10.1109/PDP.2015.62","url":null,"abstract":"View shed refers to the land area that is visible to an observer placed in a point of a terrain. Due to the advances in remote sensing technologies the volume of data is today beyond the capability of traditional GIS tools and therefore new and fast algorithms become essential. In this paper we present an efficient implementation of the XDRAW algorithm [5] to quickly compute view sheds on very large digital elevation models. We redesign the algorithm to make it IO-efficient and compatible with modern SIMD architectures. Our implementation is able to compute view sheds on digital elevation models at the rate of 109 points per second on an Intel quad-core CPU with AVX2 technology, which makes the algorithm suitable for real-time applications.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125324205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Iliasov, A. Rafiev, Fei Xia, Rem Gensh, A. Romanovsky, A. Yakovlev
{"title":"A Formal Specification and Prototyping Language for Multi-core System Management","authors":"A. Iliasov, A. Rafiev, Fei Xia, Rem Gensh, A. Romanovsky, A. Yakovlev","doi":"10.1109/PDP.2015.107","DOIUrl":"https://doi.org/10.1109/PDP.2015.107","url":null,"abstract":"We relate the experience of a defining a formal domain specific language (DSL) for the construction and reasoning about OS-level management logic of multi-core systems. The approach is based on a novel, iterative development principle where results of prototyping studies feed back into the next language revision. We illustrate the DSL with several examples of executable scripts.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128064290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Dkhil, XuanKhanh Do, Stéphane Louise, Christine Rochange
{"title":"A Hybrid Scheduling Algorithm Based on Self-Timed and Periodic Scheduling for Embedded Streaming Applications","authors":"A. Dkhil, XuanKhanh Do, Stéphane Louise, Christine Rochange","doi":"10.1109/PDP.2015.109","DOIUrl":"https://doi.org/10.1109/PDP.2015.109","url":null,"abstract":"In this paper, we consider the problem of multiprocessor scheduling for safety-critical streaming applications modeled as acyclic data-flow graphs. To the best of our knowledge, most existing works have proposed periodic scheduling that ignore latency or can even have a negative impact on it: the results are quite far from those obtained under Self-Timed scheduling (STS). In this paper, we introduce a new scheduling policy noted Self-Timed Periodic (STP), which is an execution model combining self-timed scheduling with periodic scheduling. The proposed framework shows that the use of both strategies is possible and that they complement each other, STS improves the performance metrics of the programs, while the periodic model captures the timing aspects. We evaluate the performance of our scheduling policy for a set of 10 real-life streaming applications. We find that in most of the cases, our approach gives a significant improvement in latency compared to the Static Periodic Schedule (SPS), and results which are close to the best case latency of STS.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123524029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Iakovidou, Loukas Bampis, S. Chatzichristofis, Y. Boutalis, A. Amanatiadis
{"title":"Color and Edge Directivity Descriptor on GPGPU","authors":"C. Iakovidou, Loukas Bampis, S. Chatzichristofis, Y. Boutalis, A. Amanatiadis","doi":"10.1109/PDP.2015.105","DOIUrl":"https://doi.org/10.1109/PDP.2015.105","url":null,"abstract":"Image indexing refers to describing the visual multimedia content of a medium, using high level textual information or/and low level descriptors. In most cases, images and videos are associated with noisy and incomplete user-supplied textual annotations, possibly due to omission or the excessive cost associated with the metadata creation. In such cases, Content Based Image Retrieval (CBIR) approaches are adopted and low level image features are employed for indexing and retrieval. We employ the Colour and Edge Directivity Descriptor (CEDD), which incorporates both colour and texture information in a compact representation and reassess it for parallel execution, utilizing the multicore power provided by General Purpose Graphic Processing Units (GPGPUs). Experiments conducted on four different combinations of GPU-CPU technologies revealed an impressive gained acceleration when using a GPU, which was up to 22 times faster compared to the respective CPU implementation, while real-time indexing was achieved for all tested GPU models.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134514506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaofan Zhang, M. Ebrahimi, Letian Huang, Guangjun Li, A. Jantsch
{"title":"A Routing-Level Solution for Fault Detection, Masking, and Tolerance in NoCs","authors":"Xiaofan Zhang, M. Ebrahimi, Letian Huang, Guangjun Li, A. Jantsch","doi":"10.1109/PDP.2015.87","DOIUrl":"https://doi.org/10.1109/PDP.2015.87","url":null,"abstract":"Faults may occur in numerous locations of a router in a NoC platform. Compared with the faults in the data path, faults in the control path may cause more severe effects which may result in crashing the entire system. Most of the current efforts in literature focus on disabling a router when a fault is detected. Considering this level of coarse-granularity, the functioning parts of a router have to be unnecessarily disabled which may severely affect the performance or functionality of the on-chip network. To cope with this problem, in this paper we propose a mechanism to tolerate faults in the control path which largely avoid disabling a router as long as the fault is not severe. This mechanism is called DMT, standing for three distinguishing characteristics of the proposed method as fault Detection, fault Masking and fault Tolerance. The proposed mechanism can efficiently detect the faults expressed as illegal turns while it has the capability to tolerate faults without a prior knowledge on where and why a fault has happened.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132348205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimality of Fundamental Parallel Algorithms on the Hierarchical Memory Machine, with GPU Implementation","authors":"K. Nakano, Yasuaki Ito","doi":"10.1109/PDP.2015.46","DOIUrl":"https://doi.org/10.1109/PDP.2015.46","url":null,"abstract":"The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of CUDA-enabled GPU architecture. It has multiple streaming multiprocessors with a shared memory, and the global memory that can be accessed by all threads. The HMM has several parameters: the number d of streaming multiprocessors, the number p of threads per streaming multiprocessor, the number w of memory banks of each shared memory and the global memory, shared memory latency l, and global memory latency L. The main purpose of this paper is to discuss optimality of fundamental parallel algorithms running on the HMM. We first show that image convolution for an image with n × n pixels using a filter of size (2v+1) × (2v+1) can be done in O(n2/w+n2L/dp+n2v2/dw+n2v2l/dp) time units on the HMM. Further, we show that this parallel implementation is time optimal by proving the lower bound of the running time. We then go on to show that the product of two n × n matrices can be computed in O(n3/mw+n3L/mdp+n3/dw+n3l/dp) time units on the HMM if the capacity of the shared memory in each streaming multiprocessor is O(m2). This implementation is also proved to be time optimal. We further clarify the conditions for image convolution and matrix multiplication to hide the memory access latency overhead and to maximize the global memory throughput and the parallelism. Finally, we provide experimental results on GeForce GTX Titan to support our theoretical analysis.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115013211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Countermeasure Selection in SIEM Systems Based on the Integrated Complex of Security Metrics","authors":"Igor Kotenko, E. Doynikova","doi":"10.1109/PDP.2015.34","DOIUrl":"https://doi.org/10.1109/PDP.2015.34","url":null,"abstract":"The paper considers a technique for countermeasure selection in security information and event management (SIEM) systems. The developed technique is based on the suggested complex of security metrics. For the countermeasure selection the set of security metrics is extended with an additional level needed for security decision support. This level is based on the countermeasure effectiveness metrics. Key features of the suggested technique are application of the attack and service dependencies graphs, the introduced model of the countermeasure and the suggested metrics of the countermeasure effectiveness, cost and collateral damage. Other important feature of the technique is providing the solution on the countermeasure implementation in any time on the base of the current security state and security events.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116322346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extending a Peer-Based Coordination Model with Composable Design Patterns","authors":"E. Kühn, Stefan Craß, Gerald Schermann","doi":"10.1109/PDP.2015.99","DOIUrl":"https://doi.org/10.1109/PDP.2015.99","url":null,"abstract":"Distributed applications require coordination of distributed software components in order to achieve a common goal. A coordination model that abstracts the complexity of network communication eases the development of such applications. The objective is to design collaboration with remote hosts in the same way as local interactions. Separation of coordination logic and application code increases maintainability, as components can be easily replaced with alternative versions. Different applications often have similar requirements on their coordination logic, e.g. concerning replication or load balancing. Instead of developing this functionality for each application separately, reusable generic patterns should be applied and adapted to the corresponding use case. In this paper, we describe how a coordination model based on asynchronous, data-driven communication among autonomous peers can be enhanced with a mechanism to support flexible coordination patterns. Patterns are adapted using parametrization and extension mechanisms, while complex coordination tasks can be modeled via composition of simpler sub-patterns. These concepts are demonstrated on an example where a MapReduce algorithm is incrementally designed and implemented using coordination patterns on top of a middleware that realizes the examined coordination model.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116353064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}