2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)最新文献

筛选
英文 中文
Implementing the Open Community Runtime for Shared-Memory and Distributed-Memory Systems 实现共享内存和分布式内存系统的开放社区运行时
J. Dokulil, Martin Sandrieser, S. Benkner
{"title":"Implementing the Open Community Runtime for Shared-Memory and Distributed-Memory Systems","authors":"J. Dokulil, Martin Sandrieser, S. Benkner","doi":"10.1109/PDP.2016.81","DOIUrl":"https://doi.org/10.1109/PDP.2016.81","url":null,"abstract":"The extreme scale, complexity and performance variability of future high performance computing systems pose many new challenges to parallel programming models and runtime systems. The Open Community Runtime (OCR) is a recent effort for a task-based runtime system for extreme scale parallel systems. We have implemented the OCR specification in a shared-memory environment on top of TBB, providing an alternative to the implementation created by the OCR consortium. We have created an experimental extension that supports parallel accelerators programmed with OpenCL. We also have an implementation that targets distributed-memory systems. Despite being in an early stage of development, our implementations can achieve reasonable performance with some applications. We describe the main aspects of our OCR implementations and report on early experimental results on shared-memory and distributed-memory systems.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134603892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Towards a General Framework for Ensuring and Reusing Proofs of Termination Detection in Distributed Computing 分布式计算中终端检测证明保证与重用的通用框架
Maha Boussabbeh, M. Tounsi, A. Kacem, M. Mosbah
{"title":"Towards a General Framework for Ensuring and Reusing Proofs of Termination Detection in Distributed Computing","authors":"Maha Boussabbeh, M. Tounsi, A. Kacem, M. Mosbah","doi":"10.1109/PDP.2016.113","DOIUrl":"https://doi.org/10.1109/PDP.2016.113","url":null,"abstract":"Distributed algorithms are designed to run on interconnected autonomous computing entities for achieving a common task: each entity executes asynchronously the same code and interacts locally with its immediate neighbours. It is widely agreed that the lack of knowledge of the global state makes termination detection one of the most important and complex problems in distributed computing. By relying on refinement, we prove that an algorithm computing a spanning tree with Local Termination Detection (each entity is able to determine only its own termination condition), can be reused and adapted in order to compute the same algorithm with Global Termination Detection (at least one entity is aware that the entire computation is achieved in the network). The main idea relies upon specifying a combination of a well known algorithm namely SSP and the spanning tree algorithm, following a top/down approach. This paper is a starting point towards a general framework for enhancing termination detection property of distributed algorithms and reusing their proofs.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116117666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Time Synchronization Protocol for Modular Robots 模块化机器人的时间同步协议
André Naz, Benoît Piranda, S. Goldstein, J. Bourgeois
{"title":"A Time Synchronization Protocol for Modular Robots","authors":"André Naz, Benoît Piranda, S. Goldstein, J. Bourgeois","doi":"10.1109/PDP.2016.73","DOIUrl":"https://doi.org/10.1109/PDP.2016.73","url":null,"abstract":"In this paper, we propose the Modular Robot Time Protocol (MRTP), a network-wide time synchronization protocol for modular robots. Our protocol achieves its performance by combining several mechanisms: central time master election, low-level time-stamping and clock skew compensation using linear regression. We evaluate our protocol on the Blinky Blocks hardware. Experimental results show that MRTP can potentially manage real systems composed of up to 27,775 Blinky Blocks. We observe that the synchronization precision depends on the hardware, the hop distance to the time master, the synchronization periods and the number of synchronization points used for the linear regressions. Furthermore, we show that our protocol is able to keep a Blinky Blocks system synchronized to a few milliseconds, using few network resources at runtime, even-though the Blinky Blocks hardware clocks exhibit very poor accuracy and resolution.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124102059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Estimation Models for NoSQL Database Consistency Characteristics NoSQL数据库一致性特征估计模型
A. Burdakov, Y. Grigorev, A. Ploutenko, Eugene Ttsviashchenko
{"title":"Estimation Models for NoSQL Database Consistency Characteristics","authors":"A. Burdakov, Y. Grigorev, A. Ploutenko, Eugene Ttsviashchenko","doi":"10.1109/PDP.2016.23","DOIUrl":"https://doi.org/10.1109/PDP.2016.23","url":null,"abstract":"This article considers NoSQL database replication problems. It analyzes the influence of the N, W, R replication parameters on the consistency characteristics of database record replicas (N -- the total number of one record's replicas, W -- number of replicas for write operation execution into a database, R -- number of replicas for record read operation execution from a database). It describes a developed model for eventual consistency (W+R ≤ N), obtaining probability estimate that during the process of N-W replica updates there will be at least one read request out of non-updated replicas. It also proposes a model for strong consistency of the replicas in NoSQL databases, which allows for estimation of random wait time of the read request for the record update completion. It describes the process for preparation and execution of experiments in the cloud for model calibration and its validation.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123032794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Evaluation of the Memory Communication Traffic in a Hierarchical Cache Model for Massively-Manycore Processors 海量多核处理器分层缓存模型中内存通信流量的评估
Sharifa Al Khanjari, W. Vanderbauwhede
{"title":"Evaluation of the Memory Communication Traffic in a Hierarchical Cache Model for Massively-Manycore Processors","authors":"Sharifa Al Khanjari, W. Vanderbauwhede","doi":"10.1109/PDP.2016.30","DOIUrl":"https://doi.org/10.1109/PDP.2016.30","url":null,"abstract":"The scaling of semiconductor technologies is leading to processors with increasing numbers of cores. A key enabler in manycore systems is the use of Networks-on-Chip (NoC) as a global communication mechanism. The adoption of NoCs in manycore systems requires a shift in focus from computation to communication, as communication is fast becoming the dominant factor in processor performance. Many researchers have focused on direct communication between cores in the NoC, however in a manycore processor the communication is actually between the cores and the memory hierarchy. In this work, we investigate the memory communication traffic of shared threads in a hierarchical cache architecture. We argue that the performance scalability for shared-memory applications in a hierarchical cache architecture for systems with thousands of processor cores depends on the distance between threads sharing memory in terms of the cache hierarchy (the \"memory distance\"). We present latency and throughput results comparing fat quadtree, concentrated mesh and mesh topologies as a function of the \"memory distance\" between the threads. Our results using the ITRS physical data for 2023 show that the model of thread placement and the distance of placing them significantly affects the NoC performance, and that scale-invariant topologies perform better than flat topologies.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121468446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets 使用可控Riesz小波的gpu加速纹理分析
A. Vizitiu, L. Itu, Ranveer Joyseeree, A. Depeursinge, H. Müller, C. Suciu
{"title":"GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets","authors":"A. Vizitiu, L. Itu, Ranveer Joyseeree, A. Depeursinge, H. Müller, C. Suciu","doi":"10.1109/PDP.2016.105","DOIUrl":"https://doi.org/10.1109/PDP.2016.105","url":null,"abstract":"Visual pattern recognition is a key research topic in the field of image processing and computer vision. Texture analysis based on steerable Riesz wavelets is powerful, but requires computing pixel-wise operations resulting in a run time in the order of days when large volumes of data are processed. To overcome this limitation we propose a Graphics Processing Unit (GPU) based solution. A standard CPU version is used as starting point for the development of baseline GPU versions. To further increase the performance, and to overcome compute and memory limitations we apply a series of optimization techniques, leading to five versions in total. The best performing GPU solution ensures a speed-up of 93× for the parallelized section of the application and of 29.6× for the entire application. Furthermore, we show that a higher Riesz order and/or a higher image resolution further increases the speed-up.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125505908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reasoning about Fences and Relaxed Atomics 关于篱笆和放松原子的推理
Mengda He, Viktor Vafeiadis, S. Qin, J. Ferreira
{"title":"Reasoning about Fences and Relaxed Atomics","authors":"Mengda He, Viktor Vafeiadis, S. Qin, J. Ferreira","doi":"10.1109/PDP.2016.103","DOIUrl":"https://doi.org/10.1109/PDP.2016.103","url":null,"abstract":"For efficiency reasons, weak (or relaxed) memory is now the norm on modern architectures. To cater for this trend, modern programming languages are adapting their memory models. The new C11 memory model [1] allows several levels of memory weakening, including non-atomics, relaxed atomics, release-acquire atomics, and sequentially consistent atomics. Under such weak memory models, multithreaded programs exhibit more behaviours, some of which would have been inconsistent under the traditional strong (i.e. sequentially consistent) memory model. This makes the task of reasoning about concurrent programs even more challenging. The GPS framework, recently developed by Turon et al.[22], has made a step forward towards tackling this challenge. By integrating ghost states, per-location protocols and separation logic, GPS can successfully verify programs with release-acquire atomics. In this paper, we present a program logic, an enhancement of the GPS framework, that can support the verification of a bigger class of C11 programs, that is, programs with release-acquire atomics, relaxed atomics and release-acquire fences. Key elements of our proposed logic include two new types of assertions, a more expressive resource model and a set of newly-designed verification rules.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"441 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134276467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Energy Efficient Scheduling of Real Time Signal Processing Applications through Combined DVFS and DPM 结合DVFS和DPM的实时信号处理应用节能调度
Erwan Nogues, M. Pelcat, D. Ménard, Alexandre Mercat
{"title":"Energy Efficient Scheduling of Real Time Signal Processing Applications through Combined DVFS and DPM","authors":"Erwan Nogues, M. Pelcat, D. Ménard, Alexandre Mercat","doi":"10.1109/PDP.2016.15","DOIUrl":"https://doi.org/10.1109/PDP.2016.15","url":null,"abstract":"This paper proposes a framework to design energy efficient signal processing systems. The energy efficiency is provided by combining Dynamic Frequency and Voltage Scaling (DVFS) and Dynamic Power Management (DPM). The framework is based on Synchronous Dataflow (SDF) modeling of signal processing applications. A transformation to a single rate form is performed to expose the application parallelism. An automated scheduling is then performed, minimizing the constraint of energy efficiency and providing DVFS and DPM decisions. This framework uses an architecture model including the number of available cores, the per-actor processing load and the energy per-cycle, derived from time and power measurements of modelled applications. After introducing the proposed framework, the energy characterization of big.LITTLE SoC systems is described. A generic approach is presented to generate the energy model of a platform from power measurements as customized polynomials. Finally, the experimental results on a Samsung Exynos 5410 big.LITTLE processor show that the energy optimal execution is not obtained by Linux governors that can execute either as-fast-as-possible or as-slow-as-possible. Instead, the most energy efficient scheduling is obtained by adapting both DVFS and DPM to application needs.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130727189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Predicting Performance and Power Consumption of Parallel Applications 预测并行应用程序的性能和功耗
D. D. Sensi
{"title":"Predicting Performance and Power Consumption of Parallel Applications","authors":"D. D. Sensi","doi":"10.1109/PDP.2016.41","DOIUrl":"https://doi.org/10.1109/PDP.2016.41","url":null,"abstract":"Current architectures provide many control knobs for the reduction of power consumption of applications, like reducing the number of used cores or scaling down their frequency. However, choosing the right values for these knobs in order to satisfy requirements on performance and/or power consumption is a complex task and trying all the possible combinations of these values is an unfeasible solution since it would require too much time. For this reasons, there is the need for techniques that allow an accurate estimation of the performance and power consumption of an application when a specific configuration of the control knobs values is used. Usually, this is done by executing the application with different configurations and by using these information to predict its behaviour when the values of the knobs are changed. However, since this is a time consuming process, we would like to execute the application in the fewest number of configurations possible. In this work, we consider as control knobs the number of cores used by the application and the frequency of these cores. We show that on most Parsec benchmark programs, by executing the application in 1% of the total possible configurations and by applying a multiple linear regression model we are able to achieve an average accuracy of 96% in predicting its execution time and power consumption in all the other possible knobs combinations.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121096555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Using Nested Graphs to Distribute Parallel and Distributed Multi-agent Systems 使用嵌套图分布并行和分布式多智能体系统
A. Rousset, B. Herrmann, C. Lang, L. Philippe, Hadrien Bride
{"title":"Using Nested Graphs to Distribute Parallel and Distributed Multi-agent Systems","authors":"A. Rousset, B. Herrmann, C. Lang, L. Philippe, Hadrien Bride","doi":"10.1109/PDP.2016.91","DOIUrl":"https://doi.org/10.1109/PDP.2016.91","url":null,"abstract":"Simulation has become an indispensable tool for researchers to explore systems without having recourse to real experiments. In this context multi-agent systems are often used to model and simulate complex systems. Depending on the characteristics of the modelled system, methods used to represent the system may vary. Whatever the modelling techniques used, increasing the size and the precision of a model increases the amount of computation needed, requiring the use of parallel systems when it becomes too large. Usually, to efficiently run on parallel resources, the model must be adapted to be distributed. In this paper, we propose a new modelling approach, based on nested graphs, that allows the design of large, complex and multi-scale multi-agent models which can be efficiently distributed on parallel resources. A PDMAS (Parallel and Distributed Multi-Agent Platform) that supports this approach and efficiently run parallel multi-agent models is introduced.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126189693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信