J. Nezan, Alexandre Mercat, P. Delmas, G. Gimel'farb
{"title":"Optimized Belief Propagation Algorithm onto Embedded Multi and Many-Core Systems for Stereo Matching","authors":"J. Nezan, Alexandre Mercat, P. Delmas, G. Gimel'farb","doi":"10.1109/PDP.2016.52","DOIUrl":"https://doi.org/10.1109/PDP.2016.52","url":null,"abstract":"Stereo matching techniques aim at reconstructing disparity maps from a pair of images. The use of stereo matching techniques in embedded systems is very challenging due to the complexity of the state-of-the-art algorithms. Local stereo matching algorithms are efficiently implemented on GPU and DSP. This paper presents the optimization of the One Dimension Belief Propagation (BP-1D) algorithm. BP-1D is faster than previous algorithms on monocore DSP and its implementation onto multicore DSPs is straightforward. BP-1D implemented on multicore embedded platforms out-performs previous stereo matching implementations reaching real-time performances for resolutions up to 1080p with a 10 Watts power consumption.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129349748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gate Merging: An NBTI Mitigation Method to Eliminate Critical Internal Nodes in Digital Circuits","authors":"Maryam Ghane, H. Zarandi","doi":"10.1109/PDP.2016.90","DOIUrl":"https://doi.org/10.1109/PDP.2016.90","url":null,"abstract":"This paper presents a method to mitigate Negative Bias Temperature Instability (NBTI) in digital circuits. Since effect of NBTI strongly depends on digital logic value of internal nodes, this method uses internal nodes control (INC) method to reduce NBTI-critical transistors. There are some internal nodes in digital circuits that are under severe NBTI. This method at first, identifies NBTI-critical internal nodes in critical and non-critical paths by calculating probability of being under NBTI stress. Second, it eliminates these internal nodes by combining NBTI-sensitive gates and their driver gates, generating a new complex gate. These complex gates have the same logic and remove any NBTI-critical transistors. The proposed method reduces NBTI in combinational and sequential CMOS circuits and increases their lifetime. Experimental results on ISCAS'89 benchmark circuits show that NBTI-critical transistors, NBTI-induced delay degradation and the number of circuit's transistors are decreased about 86.1%, 15.12% and 4.3%, respectively. However, this method imposes area overhead of 0.2% for the investigated circuits.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124532980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neighbor Detection Based on Multiple Virtual Mobile Nodes","authors":"Behnaz Bostanipour, B. Garbinato","doi":"10.1109/PDP.2016.43","DOIUrl":"https://doi.org/10.1109/PDP.2016.43","url":null,"abstract":"We introduce an algorithm that implements a time-limited neighbor detector service in mobile ad hoc networks. The time-limited neighbor detector enables a mobile device to detect nearby devices in the past, present and up to some bounded time interval in the future. In particular, it can be used by a new trend of mobile applications known as proximity-based mobile applications. To implement the neighbor detector, our algorithm uses n = 2k virtual mobile nodes where k is a non-negative integer. A virtual mobile node is an abstraction that is akin to a mobile node that travels in the network in a predefined trajectory. In practice, it can be implemented by a set of mobile nodes based on a replicated state machine approach. Our algorithm implements the neighbor detector for nodes located in a circular region. We assume that each node can accurately predict its own locations up to some bounded time interval Δpredict in the future. The key idea of the algorithm is that the virtual mobile nodes regularly collect location predictions of nodes from different subregions, meet to share what they have collected with each other and then distribute the collected location predictions to nodes. Thus, each node can use the distributed location predictions for neighbor detection. We show that our algorithm is correct under certain conditions. Compared to a solution that works with a single virtual mobile node, our algorithm has a main advantage: as n grows, it remains correct with smaller values of Δpredict. This feature makes the real world implementation of the neighbor detector more feasible. In fact, although there exist different approaches to predict the future locations of a node, usually predictions tend to become less accurate as Δpredict increases.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128705810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco Sanches Banhos Filho, Eduardo Javier Huerta Yero
{"title":"Exact Vs. Approximated Diameter Calculation in Large Graphs","authors":"Francisco Sanches Banhos Filho, Eduardo Javier Huerta Yero","doi":"10.1109/PDP.2016.71","DOIUrl":"https://doi.org/10.1109/PDP.2016.71","url":null,"abstract":"A graph is a mathematical abstraction commonly used to represent relationships among a finite set of entities, such as hypertext documents or users in a social network. With the recent explosion of online content, the size and number of available graphs have increased as well, prompting research for efficient and scalable methods to process them in a timely fashion. This paper focuses on the calculation of the diameter of a graph, a well-known and relevant metric whose calculation poses a remarkable computational challenge for large graphs. We selected three algorithms based on two popular computing models: MapReduce and Bulk Synchronous Parallel (BSP). Two of the algorithms are based on MapReduce and calculate the exact and an approximated value for the graph diameter. The third algorithm is based on BSP and produces the exact value for the diameter. Our tests show that the approximated MapReduce solution produces the best combination of execution time and scalability, although it is outperformed in some cases by the exact BSP solution.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121634575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Globally Asynchronous Locally Synchronous Simulation of NoCs on Many-Core Architectures","authors":"Marcus Eggenberger, Manuel Strobel, M. Radetzki","doi":"10.1109/PDP.2016.118","DOIUrl":"https://doi.org/10.1109/PDP.2016.118","url":null,"abstract":"We evaluate the applicability of many-core architectures for the simulation of networks on chips (NoC). Compared to the well established shared memory multi-core architectures, many-core architectures significantly differ not only in the number of processing elements but also in the on-chip communication architecture, the memory subsystem, and the computational performance of an individual core. Proven multi-core simulation approaches do not consider such architectural aspects and thus suffer limited performance when being applied to many-core architectures. To enable high performance simulation, we identify conceptual drawbacks of state of the art parallel simulation approaches and consequently propose a novel globally asynchronous locally synchronous (GALS) simulation concept suited for many-core architectures. Our results show that our GALS simulation approach yields a speedup of up to 2.3 over parallel discrete event simulation.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125693215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Formal Analysis and Model Checking of a Group Authentication Protocol by Scyther","authors":"Huihui Yang, A. Prinz, V. Oleshchuk","doi":"10.1109/PDP.2016.27","DOIUrl":"https://doi.org/10.1109/PDP.2016.27","url":null,"abstract":"Scyther [1] is designed to check the security and vulnerabilities of security protocols. In this paper, we use Scyther to analyze two discrete logarithm problem (DLP) based group authentication protocols proposed in [2]. These two protocols are claimed to satisfy several security requirements, but only part of them have been checked because of the properties and limitations of Scyther. Some positive results have been gained and show that the protocols provide mutual authentication and implicit key authentication and are secure against impersonation attack. An important innovation in this paper is that we have extended the expressing ability of Scyther by giving some reasonable assumption during the analysis and security checking.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126802613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hamza Ouarnoughi, Jalil Boukhobza, Frank Singhoff, S. Rubini
{"title":"A Cost Model for Virtual Machine Storage in Cloud IaaS Context","authors":"Hamza Ouarnoughi, Jalil Boukhobza, Frank Singhoff, S. Rubini","doi":"10.1109/PDP.2016.119","DOIUrl":"https://doi.org/10.1109/PDP.2016.119","url":null,"abstract":"This paper proposes a storage system cost model for Infrastructure as a Service (IaaS) Cloud. The proposed cost model takes into account the virtualization environment, the storage system characteristics in addition to energy and QoS related parameters (Service Level Agreement and penalties). We show that those parameters are relevant and allow us to predict an accurate estimation of the overall cost of the IaaS infrastructure. We validate this cost model against real measures and we show less than 10% of error in most cases. Designers and administrators can use this cost model to perform optimization, load balancing, configuration and pricing of the Cloud infrastructure.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127370101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Soft Error Detection in Multicore Processors Running Server Applications","authors":"A. Tajary, H. Zarandi","doi":"10.1109/PDP.2016.100","DOIUrl":"https://doi.org/10.1109/PDP.2016.100","url":null,"abstract":"In this paper, a throughput-aware transient fault detection method is presented with respect to the features of server processors. The proposed method takes the advantages of combination of reconfigurable redundant execution-based fault detection and speculative fault detection. The reconfigurable redundant execution-based fault detection method by using configuration manager module couples two free adjacent cores on which a thread will be executed, and decouples them when resources are limited for normal execution. This method exploits unused resources in the multi-core processors to ensure high throughput reliable execution. The speculative fault detection method uses a history of block addresses requested form L1 cache to L2 cache during thread execution to find abnormal execution behavior. In order to evaluate the proposed method, the alpha processor model is utilized in the context of Gem5 simulator. The experimental results showed that 70% of injected faults can be detected with negligible hardware overhead.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125915411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Li, Xuechen Liu, Qiankun Dong, Wenjing Ma, Kai Wang
{"title":"HPSVM: Heterogeneous Parallel SVM with Factorization Based IPM Algorithm on CPU-GPU Cluster","authors":"Tao Li, Xuechen Liu, Qiankun Dong, Wenjing Ma, Kai Wang","doi":"10.1109/PDP.2016.29","DOIUrl":"https://doi.org/10.1109/PDP.2016.29","url":null,"abstract":"Support vector machine (SVM) is a supervised method widely used in the statistical classification and regression analysis. SVM training can be solved via the interior point method (IPM) with the advantages of low storage, fast convergence and easy parallelization. However, it is still confronted with the challenges of training speed and memory use. In this paper, we propose a parallel primal-dual IPM algorithm based on the incomplete Cholesky factorization (ICF) for efficiently training large-scale SVMs, named HPSVM, on CPU-GPU cluster. Our approach is distinguished from earlier work in that it is specifically designed to take maximal advantage of the CPU-GPU collaborative computation with the dual buffers 3-stage pipeline mechanism, and efficiently handles large-scale training datasets. In HPSVM, the heterogeneous hierarchical memory is fully explored to alleviate the bottleneck for optimizing data transfer, and the programming paradigm is presented to build an efficient collaboration mechanism between CPU and GPU. Comprehensive experiments show that HPSVM is up to 11 times faster than the CPU version on real datasets.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126711432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering Effects on the Design of Opto-Electrical Network-on-Chip","authors":"Meisam Abdollahi, Alireza Namazi, S. Mohammadi","doi":"10.1109/PDP.2016.126","DOIUrl":"https://doi.org/10.1109/PDP.2016.126","url":null,"abstract":"Emerging nanoscale silicon-photonics with its advances in fabrication and integration of on-chip CMOS-compatible optical elements are good news for system designers. Optical Network-on-Chips (ONoCs) could be the next generation of NoCs. On the other hand, hybrid opto-electrical networks may provide higher bandwidth, lower latency and better power dissipation when considering both optical and electrical characteristics on multicore platforms. The cluster-based technique locally connects processing cores through electrical interconnect, while the clusters themselves are connected together through an optical waveguide. The experimental results show that in most benchmark applications, the cluster size of 4 proves to be an appropriate size for optimizing the energy-delay product (EDP) parameter.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131300507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}