{"title":"Energy-Aware Migration of Virtual Machines Driven by Predictive Data Mining Models","authors":"Albino Altomare, Eugenio Cesario, D. Talia","doi":"10.1109/PDP.2015.40","DOIUrl":"https://doi.org/10.1109/PDP.2015.40","url":null,"abstract":"Consolidation of virtual machines (VM) is one of the key strategies used to reduce the power consumption of Cloud servers. For this reason it is extensively studied. Nevertheless, the effectiveness of a consolidation strategy strongly depends on the forecast of the VM resource needs. This paper describes the design and development of a system for energy-aware allocation of virtual machines, driven by predictive data mining models. In particular, migrations are driven by the forecast of the future computational needs (CPU, RAM) of each virtual machine, in order to efficiently allocate those on the available servers. Experimental results, performed on data of a real Cloud data centre, show encouraging benefits in terms of energy saving.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122097856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling an Improved Modified Type in Metallic Quantum-Dot Fixed Cell for Nano Structure Implementation","authors":"S. Sayedsalehi, A. Roohi","doi":"10.1109/PDP.2015.77","DOIUrl":"https://doi.org/10.1109/PDP.2015.77","url":null,"abstract":"Quantum-dot cellular automata (QCA) is a transistor-less computation approach which encodes binary information via configuration of charges among quantum dots. The fundamental QCA logic primitives are the majority gate and the inverter gate which can be employed to design various QCA circuits. In this study by applying some fixed predefined level of polarization, a detailed modeling of a modified type of fixed metal-dots QCA cell will be explored. An efficient architecture controlled by predefined polarization of fixed cells that position next to the input cells is presented for implementing a desired nano structure. The efficiency of the proposed approach is verified by implementing of several important examples of Boolean function.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122129097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hybrid Parallel Implementation of Model Selection for Support Vector Machines","authors":"Giuseppe Ripepi, A. Clematis, D. D'Agostino","doi":"10.1109/PDP.2015.97","DOIUrl":"https://doi.org/10.1109/PDP.2015.97","url":null,"abstract":"The Model Selection (MS) is an important part of any statistical analysis, and for Support Vector Machine becomes crucial in order to reach the best performance. However, the MS is a compute intensive and non-convex problem, therefore an efficient parallelization is highly desirable. For this reason, in this work we compare two different approaches in MS parallelization, based on the use of MPI and hybrid MPI+OpenMP composition. Results show a clear and considerable advantage in using the latter solution.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123534585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy Driven Adaptivity in Stream Parallel Computations","authors":"M. Danelutto, D. D. Sensi, M. Torquati","doi":"10.1109/PDP.2015.92","DOIUrl":"https://doi.org/10.1109/PDP.2015.92","url":null,"abstract":"Determining the right amount of resources needed for a given computation is a critical problem. In many cases, computing systems are configured to use an amount of resources to manage high load peaks even though this cause energy waste when the resources are not fully utilised. To avoid this problem, adaptive approaches are used to dynamically increase/decrease computational resources depending on the real needs. A different approach based on Dynamic Voltage and Frequency Scaling (DVFS) is emerging as a possible alternative solution to reduce energy consumption of idle CPUs by lowering their frequencies. In this work, we propose to tackle the problem in stream parallel computations by using both the classic adaptivity concepts and the possibility provided by modern CPUs to dynamically change their frequency. We validate our approach showing a real network application that performs Deep Packet Inspection over network traffic. We are able to manage bandwidth changing over time, guaranteeing minimal packet loss during reconfiguration and minimal energy consumption.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124755945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Vector Implementation of Gaussian Elimination over GF(2): Exploring the Design-Space of Strassen's Algorithm as a Case Study","authors":"E. Morancho","doi":"10.1109/PDP.2015.24","DOIUrl":"https://doi.org/10.1109/PDP.2015.24","url":null,"abstract":"Gaussian elimination is a key algorithm in linear algebra. It has many usages, for instance solving systems of linear equations and determining whether a set of vectors is linearly independent. This algorithm transforms an input matrix into a matrix in row (column) echelon form. The matrix entries and the transformations are defined over algebraic fields either infinite (e.g. the real numbers) or finite (e.g. GF (2)). This work discusses a vector implementation of this algorithm over GF (2). The evaluation develops a case study that searches exhaustively for algorithms over GF (2) similar to Strassen's algorithm (a matrix-multiply algorithm with sub cubic complexity) because the search engine requires solving a huge number of Gaussian eliminations over GF (2). Our vector implementation allows the search engine to complete the exploration in less than nine hours on a commodity processor supporting AVX2, outperforming by 1.92X a scalar-SWAR implementation specialized for the case study and by 7.43X a generic scalar-SWAR implementation. Our results show that, over GF (2), there are 20 algorithms similar to Strassen's.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121787219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Quarati, A. Clematis, Giacomo Paschina, A. Parodi, T. Bedrina
{"title":"Lightweight ICT Approaches to Hydro-Meteorological Data Issues","authors":"A. Quarati, A. Clematis, Giacomo Paschina, A. Parodi, T. Bedrina","doi":"10.1109/PDP.2015.12","DOIUrl":"https://doi.org/10.1109/PDP.2015.12","url":null,"abstract":"Earth science disciplines have imaginary borders. Earth systems are connected and work integrally. Therefore for geosciences researchers it is critical to find, analyze and publish data across domains. Unfortunately, while searching for and accessing enormous amount of heterogeneous data, scientists and professionals often tackle various types of obstacles that hinder their daily activity. Based on a survey on international initiatives in the field of Hydro-Meteorological research, the paper presents a lightweight approach to answer weather-data issues related to data accessing and retrieving, as well as it briefly describe the OGC proposal to cope with interoperability requirements and introduces a technical solution to deal with high volumes of Hydro-Meteorological sensor data.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127346406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TFluxSCC: Exploiting Performance on Future Many-Core Systems through Data-Flow","authors":"Andreas Diavastos, Giannos Stylianou, P. Trancoso","doi":"10.1109/PDP.2015.69","DOIUrl":"https://doi.org/10.1109/PDP.2015.69","url":null,"abstract":"The current trend in processor design is to increase the number of cores as to achieve a desired performance. While having a large number of cores on a chip seems to be feasible in terms of the hardware, the development of the software that is able to exploit that parallelism is one of the biggest challenges. In this paper we propose a Data-Flow based system that can be used to exploit the parallelism in large-scale many-core processors in an effective and efficient way. Our proposed system - TFlux SCC - is an extension of the TFlux Data-Driven Multithreading (DDM), which evolved to exploit the parallelism of the 48-core Intel Single-chip Cloud Computing (SCC) processor. With TFlux SCC we achieve scalable performance using a global address space without the need of cache-coherency support. Our scalability study shows that application's performance can scale, with speedup results reaching up to 48x for 48 cores. The findings of this work provide insight towards what a Data-Flow implementation requires and what not from a many-core architecture in order to scale the performance.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130368757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a Cloud Service Middleware to Utilize Free Minutes of Public Cloud Resources","authors":"Sunirmal Khatua, N. Mukherjee","doi":"10.1109/PDP.2015.82","DOIUrl":"https://doi.org/10.1109/PDP.2015.82","url":null,"abstract":"Complexity of cloud services acts as a barrier towards adopting cloud to some of the Cloud Service Users. Cloud Service Middleware plays an important role to get rid of such problems. The middleware manages and optimizes the cloud resources to execute various jobs submitted by the users. A middleware can be enhanced to utilize the idle time of the reserved resources in cloud environment by scheduling these resources free of cost to jobs submitted by the same Cloud Service User (CSU) or a different CSU. This enhancement not only makes it possible to utilize the resources to their fullest extent, but also reduces the usage cost of the CSU who reserved the resources (or a different CSU in certain cases). However, finding the mapping between the jobs and available pool of resources is a key challenge to the design of a middleware. This paper proposes some scheduling algorithms to find such mappings that minimizes the job execution cost within public cloud.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129313672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Derivation of Parallel and Resilient Programs from Simulink Models","authors":"S. Ostroumov, Pontus Boström, M. Waldén","doi":"10.1109/PDP.2015.102","DOIUrl":"https://doi.org/10.1109/PDP.2015.102","url":null,"abstract":"Modern embedded applications often require high computational power and, on the other hand, fulfilment of real-time constraints and high level of resilience. Simulink is one widely used tool for model-based development of embedded software. In this paper, we focus on the derivation of parallel programs from Simulink models and real-time resilient execution of derived implementations on a many-core platform. The main contribution is a fault-tolerance (FT) mechanism that prevents data loss when the platform is dynamically reconfigured to mask failures of individual cores. Finally, we evaluate the proposed solutions on an industrial case study using a commercially available NoC-based platform. The evaluation shows that the proposed FT mechanism has a marginal overhead.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115272503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lattice Boltzmann Simulations at Petascale on Multi-GPU Systems with Asynchronous Data Transfer and Strictly Enforced Memory Read Alignment","authors":"F. Robertsen, J. Westerholm, K. Mattila","doi":"10.1109/PDP.2015.71","DOIUrl":"https://doi.org/10.1109/PDP.2015.71","url":null,"abstract":"The lattice Boltzmann method is a well-established numerical approach for complex fluid flow simulations. Recently general-purpose graphics processing units have become accessible as high-performance computing resources at large-scale. We report on implementing a lattice Boltzmann solver for multi-GPU systems that achieves 0.69 PFLOPS performance on 16384 GPUs. In addition to optimizing the data layout on the GPUs and eliminating the halo sites, we make use of the possibility to overlap data transfer between the host CPU and the device GPU with computing on the GPU. We simulate flow in porous media and measure both strong and weak scaling performance with the emphasis being on a large scale simulation using realistic input data.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122390822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}