Marisol Monterrubio Velasco, J. C. Carrasco-Jiménez, Octavio Castillo Reyes, F. Cucchietti, J. Puente
{"title":"A Machine Learning Approach for Parameter Screening in Earthquake Simulation","authors":"Marisol Monterrubio Velasco, J. C. Carrasco-Jiménez, Octavio Castillo Reyes, F. Cucchietti, J. Puente","doi":"10.1109/CAHPC.2018.8645865","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645865","url":null,"abstract":"Earthquakes are the result of rupture in the Earth's crust. The rupture process is difficult to model deterministically due to the number of unmeasurable parameters involved and poorly constrained physical conditions, as well as the very diverse scales involved in their nucleation (meters) and complete failure (up to hundreds of kilometers). In this research work we focus on synthetic seismic catalogs generated with a stochastic modeling technique called Fiber Bundle Model (FBM). These catalogs can be readily compared with statistical measures computed from real earthquake series, but the link between the FBM parameters and the characteristics of the obtained earthquake series is difficult to assess. Furthermore, the stochastic nature of the model requires a large amount of realizations in order to attain statistical robustness. The aim of this work is to estimate the FBM parameters that generate aftershock sequences that are similar to those generated by real seismic events. In order to estimate the optimal combination of parameters that generate such sequences, we executed a large number of simulations with different combinations of parameters using High-Performance Computing (HPC) resources to reduce compute time. Lastly, the synthetic datasets were used to train a supervised Machine Learning (ML) model that analyzes and extracts statistical patterns that reproduce the observations regarding aftershock occurrence and its spatio-temporal distribution in real seismic events.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123583720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Detection of Spectre Attacks Using Microarchitectural Traces from Performance Counters","authors":"Congmiao Li, J. Gaudiot","doi":"10.1109/CAHPC.2018.8645918","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645918","url":null,"abstract":"To improve processor performance, computer architects have adopted such acceleration techniques as speculative execution and caching. However, researchers have recently discovered that this approach implies inherent security flaws, as exploited by Meltdown and Spectre. Attacks targeting these vulnerabilities can leak protected data through side channels such as data cache timing by exploiting mis-speculated executions. The flaws can be catastrophic because they are fundamental and widespread and they affect many modern processors. Mitigating the effect of Meltdown is relatively straightforward in that it entails a software-based fix which has already been deployed by major OS vendors. However, to this day, there is no effective mitigation to Spectre. Fixing the problem may require a redesign of the architecture for conditional execution in future processors. In addition, a Spectre attack is hard to detect using traditional software-based antivirus techniques because it does not leave traces in traditional log files. In this paper, we proposed to monitor microarchitectural events such as cache misses, branch mispredictions from existing CPU performance counters to detect Spectre during attack runtime. Our detector was able to achieve 0% false negatives with less than 1 % false positives using various machine learning classifiers with a reasonable performance overhead.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126253205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fault-Tolerant Agent-Based Architecture for Transient Servers in Fog Computing","authors":"J. P. A. Neto, D. Pianto, C. Ralha","doi":"10.1109/CAHPC.2018.8645859","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645859","url":null,"abstract":"Cloud datacenters are exploring their idle resources and offering virtual machine as transient servers without availability guarantees. Spot instances are transient servers offered by Amazon AWS, with rules that define prices according to supply and demand. These instances will run for as long as the current price is lower than the maximum bid price given by users. Spot instances have been increasingly used for executing computation and memory intensive applications. By using dynamic fault tolerant mechanisms and appropriate strategies, users can effectively use spot instances to run applications at a cheaper price. This paper presents a resilient multi-strategy agent-based cloud computing architecture. The architecture combines machine learning and a statistical model to predict instance survival times, refine fault tolerance parameters and reduce total execution time. We evaluate our strategies and the experiments demonstrate high levels of accuracy, reaching a 94% survival prediction success rate, which indicates that the model can be effectively used to define execution strategies to prevent failures at revocation events under realistic working conditions.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116600016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Anzt, J. Dongarra, Goran Flegar, Thomas Grützmacher
{"title":"Variable-Size Batched Condition Number Calculation on GPUs","authors":"H. Anzt, J. Dongarra, Goran Flegar, Thomas Grützmacher","doi":"10.1109/CAHPC.2018.8645907","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645907","url":null,"abstract":"We present a kernel that is designed to quickly compute the condition number of a large collection of tiny matrices on a graphics processing unit (GPU). The matrices can differ in size and the process integrates the use of pivoting to ensure a numerically-stable matrix inversion. The performance assessment reveals that, in double precision arithmetic, the new GPU kernel achieves up to 550 GFLOPs (billions of floating-point operations per second) and 800 GFLOPs on NVIDIA's P100 and V100 GPUs, respectively. The results also demonstrate a considerable speed-up with respect to a workflow that computes the condition number via launching a set of four batched kernels. In addition, we present a variable-size batched kernel for the computation of the matrix infinity norm. We show that this memory-bound kernel achieves up to 90% of the sustainable peak bandwidth.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122689289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Copyright","authors":"","doi":"10.1109/cahpc.2018.8645922","DOIUrl":"https://doi.org/10.1109/cahpc.2018.8645922","url":null,"abstract":"","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115785940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Wittmann, G. Hager, R. Janalík, M. Lanser, A. Klawonn, O. Rheinbach, O. Schenk, G. Wellein
{"title":"Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model","authors":"M. Wittmann, G. Hager, R. Janalík, M. Lanser, A. Klawonn, O. Rheinbach, O. Schenk, G. Wellein","doi":"10.1109/CAHPC.2018.8645938","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645938","url":null,"abstract":"The Roofline model is widely used to visualize the performance of executed code together with the upper performance bounds given by the memory bandwidth and the processor peak performance. The model can thus provide an insightful visualization of bottlenecks. In this paper, we try to establish realistic bandwidth ceilings for the sparse triangular solve step of PARDISO, a leading sparse direct solver package, which is also part of the Intel MKL library. The performance of the forward and backward substitution process is analyzed and benchmarked for a representative set of sparse matrices on seven modern x86-type multicore architectures and the Knights Landing manycore architecture. It is shown how to accurately measure the necessary quantities also for threaded code, and the measurement approach, its validation, as well as limitations are discussed. Our modeling approach covers the serial and parallel execution phases, allowing for in-socket performance predictions.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126035664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Souza, L. Maciel, Pedro Henrique Penna, H. Freitas
{"title":"Energy Efficient Parallel K-Means Clustering for an Intel® Hybrid Multi-Chip Package","authors":"M. Souza, L. Maciel, Pedro Henrique Penna, H. Freitas","doi":"10.1109/CAHPC.2018.8645850","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645850","url":null,"abstract":"FPGA devices have been proving to be good candidates to accelerate applications from different research topics. For instance, machine learning applications such as K-Means clustering usually relies on large amount of data to be processed, and, despite the performance offered by other architectures, FPGAs can offer better energy efficiency. With that in mind, Intel has launched a platform that integrates a multicore and an FPGA in the same package, enabling low latency and coherent fine-grained data offload. In this paper, we present a parallel implementation of the K-Means clustering algorithm, for this novel platform, using OpenCL language, and compared it against other platforms. We found that the CPU+FPGA platform was more energy efficient than the CPU-only approach from 70.71% to 85.92%, with Standard and Tiny input sizes respectively, and up to 68.21% of performance improvement was obtained with Tiny input size. Furthermore, it was up to 7.2×more energy efficient than an Intel® Xeon Phi ™, 21.5×than a cluster of Raspberry Pi boards, and 3.8×than the low-power MPPA-256 architecture, when the Standard input size was used.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125216432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Serendipity: How Supercomputing Technology is Enabling a Revolution in Artificial Intelligence","authors":"José Moreira","doi":"10.1109/cahpc.2018.8645849","DOIUrl":"https://doi.org/10.1109/cahpc.2018.8645849","url":null,"abstract":"","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132946896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Program Committees","authors":"","doi":"10.1109/cahpc.2018.8645915","DOIUrl":"https://doi.org/10.1109/cahpc.2018.8645915","url":null,"abstract":"","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133133034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Ray-Tracer Cloud Offloading in OPENMP","authors":"M. Mortatti, H. Yviquel, G. Araújo","doi":"10.1109/CAHPC.2018.8645871","DOIUrl":"https://doi.org/10.1109/CAHPC.2018.8645871","url":null,"abstract":"Rendering an image from a 3D scene requires a large amount of computation which grows exponentially with the complexity of the scene (e.g. number of objects and light sources). With the increasing demand of high definition content, 3D designers need to use high-performance computer systems to keep the rendering time acceptable. Since owning computer clusters is expensive, designers usually rent computing power directly from cloud service providers (e.g, AWS and Azure). However, even though many cloud providers already propose dedicated rendering services, integrating them within the standard workflow of modeling softwares can become a complex and cumbersome task. It typically requires exporting the project from the design software, dealing with various access control mechanisms from different clouds to upload the project, and executing the rendering remotely through command-line. Offloading computation to the cloud is a technique which can considerably simplify such tasks. To achieve that, this paper uses an extension of openMP 4.X to eliminate any major interactions with the end-user, while minimizing the complexity of cloud integration and optimizing the design workflow. It applies such approach to a ray-tracing application, a simplified version of the engines used by professional 3D modeling software (e.g. Blender). It automatically offloads the rendering process from the user computer to computer cluster within the Microsoft Azure cloud, brings the resulting images back after the computation ends and displays them directly on the screen of the user computer, thus providing a transparent programming model and good speed-ups over local execution.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123782641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}