{"title":"Towards resilient EU HPC systems: a blueprint","authors":"Petar Radojkovic","doi":"10.1145/3310273.3323434","DOIUrl":"https://doi.org/10.1145/3310273.3323434","url":null,"abstract":"In high-performance computing (HPC) a single tightly-coupled job may execute for days on thousands of servers. Since a server failure typically leads to cascading effects on the whole job, requiring redundancy and/or aggressive checkpointing to prevent the whole job from failing. This has an adverse impact on the system performance and resource usage; which limits the ability to scale to larger systems. System resiliency is therefore one of the most important Exascale requirements and challenges.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125483574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A runtime-adaptive cognitive IoT node for healthcare monitoring","authors":"M. A. Scrugli, Daniela Loi, L. Raffo, P. Meloni","doi":"10.1145/3310273.3323160","DOIUrl":"https://doi.org/10.1145/3310273.3323160","url":null,"abstract":"Wearable and energy efficient processing nodes, allowing for continuous remote monitoring of patient vital parameters, are mainstream in modern health-care practice. Most recent approaches to the development of such systems combine near-sensor data processing with cognitive computing, to improve at the same time communication efficiency, responsiveness and accuracy of the analysis of the sensed data. In this paper, we present a hardware-software architecture for a connected sensor-processing node that allows the set of in-place processing tasks to be executed to be remotely controllable by an external user. The designed system is capable of dynamically adapting its operating point to the selected computational load, to minimize power consumption. The benefits of the proposed approach are tested on a use-case involving ECG monitoring, that, when selected, performs ECG classification using a lightweigth convolutional neural network. Experimental results show how the proposed approach can provide more than 50% power consumption reduction for common ECG activity, with less than 2% memory footprint overhead and reconfiguring the system in less than 1 ms.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130767028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Niccolò Izzo, Alessandro Barenghi, L. Breveglieri, Gerardo Pelosi, P. Amato
{"title":"A secure and authenticated host-to-memory communication interface","authors":"Niccolò Izzo, Alessandro Barenghi, L. Breveglieri, Gerardo Pelosi, P. Amato","doi":"10.1145/3310273.3323401","DOIUrl":"https://doi.org/10.1145/3310273.3323401","url":null,"abstract":"Emerging non-volatile memories (NVMs) have the potential to change the memory-storage hierarchy in computing devices, and even to replace DRAM as main memories. In fact NVMs, beside offering byte-addressability and data persistence, promise better scalability and higher capacity than DRAM. However, from a security point of view, the persistent nature of emerging memories provides a larger time window to exfiltrate data from a device with respect to current DRAM-based main memories, and NVMs have in general lower write endurance than DRAM, thus requiring wear-out conscious encryption schemes. In this work we propose an architectural solution to secure non-volatile emerging memories, providing confidentiality, integrity and authenticity to the entire set of data, addresses and commands. Our solution relies on securing and authenticating the entire information transport between the host controller and the memory, enabling the storage of cleartext data inside the NVM. Such an approach allows to retain the advantage of differential write strategies without forsaking security. We validate our proposed architecture through the simulation of a set of software benchmarks on an embedded architecture, employing the gem5 trace-based architectural simulator.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"516 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132967453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Vitali, D. Gadioli, A. Beccari, C. Cavazzoni, C. Silvano, G. Palermo
{"title":"An hybrid approach to accelerate a molecular docking application for virtual screening in heterogeneous nodes: POSTER","authors":"E. Vitali, D. Gadioli, A. Beccari, C. Cavazzoni, C. Silvano, G. Palermo","doi":"10.1145/3310273.3323426","DOIUrl":"https://doi.org/10.1145/3310273.3323426","url":null,"abstract":"Molecular Docking is a crucial task in the process of Drug Discovery. This task consists in the estimation of the position of a molecule inside the docking site. It is used in the early stages of the drug discovery process to perform a virtual screening of a large library of molecule candidates. This task is usually performed using High Performance Computing platforms, due to sheer number of candidates and due to complexity of the docking problem. In this work we ported and optimized a Molecular Docking Module to an heterogeneous system with one or more GPGPU accelerators, leveraging the directive languages OpenMP and OpenACC. We show that with the proposed approach, we are able to reach a better utilization of the available resources compared to the usual CPU/GPU data splitting, reaching a 25% throughput improvement within the single node.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115505405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Doi, Hitomi Takahashi, Raymond H. Putra, T. Imamichi, H. Horii
{"title":"Quantum computing simulator on a heterogenous HPC system","authors":"J. Doi, Hitomi Takahashi, Raymond H. Putra, T. Imamichi, H. Horii","doi":"10.1145/3310273.3323053","DOIUrl":"https://doi.org/10.1145/3310273.3323053","url":null,"abstract":"Quantum computing simulation on a classical computer is difficult due to the exponential runtime and memory overhead. Previous work addresses the difficulty by utilizing multiple Graphical Processing Units (GPUs) and multi-node computers. GPUs are efficient for handling runtime issues but have limited total accessible memory space. Meanwhile, the memory of a multi-node computer can be scaled to the petabytes order, but its bandwidth for access from host computers (CPUs) is narrow. To simultaneously accelerate simulation and enlarge the total memory space, we propose a heterogeneous parallelization approach by combining GPUs and CPUs. Our simulator allocates memory to the GPUs first, and then to the CPUs. It thus accelerates simulation by using the full capabilities of the GPUs if memory for the simulation fits in the GPUs on a cluster. Allocating memory to the CPUs reduces benefits of the GPUs but enlarges the capacity of qubits in the simulation. In such case, it can exploit the memory of the GPUs to add one more qubit in the simulation if the size of memory in a node is the power of two (such as 512GB). We show empirical performance evaluations of our simulator in a distributed environment of POWER9.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134407680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Next-generation arithmetic: major performance gains with minimal disruption","authors":"John L. Gustafson","doi":"10.1145/3310273.3324895","DOIUrl":"https://doi.org/10.1145/3310273.3324895","url":null,"abstract":"Moore's law made application developers lazy, since they could rely on increases in clock speeds and transistor density to improve the performance of their codes with little or no rewriting required. The frontiers of supercomputing, such as quantum computing, are certainly exciting and promising, but also highly disruptive... even more so than the shift from serial to parallel computing. They require a complete rewrite of millions of lines of software, and the invention of completely different algorithms.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134074650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Peng, J. Vetter, S. Moore, Joydeep Rakshit, S. Markidis
{"title":"Analyzing the suitability of contemporary 3D-stacked PIM architectures for HPC scientific applications","authors":"I. Peng, J. Vetter, S. Moore, Joydeep Rakshit, S. Markidis","doi":"10.1145/3310273.3322831","DOIUrl":"https://doi.org/10.1145/3310273.3322831","url":null,"abstract":"Scaling off-chip bandwidth is challenging due to fundamental limitations, such as a fixed pin count and plateauing signaling rates. Recently, vendors have turned to 2.5D and 3D stacking to closely integrate system components. Interestingly, these technologies can integrate a logic layer under multiple memory dies, enabling computing capability inside a memory stack. This trend in stacking is making PIM architectures commercially viable. In this work, we investigate the suitability of offloading kernels in scientific applications onto 3D stacked PIM architectures. We evaluate several hardware constraints resulted from the stacked structure. We perform extensive simulation experiments and in-depth analysis to quantify the impact of application locality in TLBs, data caches, and memory stacks. Our results also identify design optimization areas in software and hardware for HPC scientific applications.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134304614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SPADA","authors":"F. B. Moreira, Daniel A. G. Oliveira, P. Navaux","doi":"10.1145/3310273.3321557","DOIUrl":"https://doi.org/10.1145/3310273.3321557","url":null,"abstract":"One of the main challenges in system security is the detection of vulnerability exploitation, especially valid control flow exploitation. The specificity of state-of-the-art methods, such as signature-based detection, becomes a limiting factor when detecting the latest exploits and attacks uncovered. We propose the detection of exploit executions by partitioning applications into phases, characterized by their Basic Block activity, and a phase behavior analysis. In contrast to previous works, our technique can detect exploits which use proper application control flows, such as Heartbleed. Moreover, our method identifies instances under attack using simple and statistically relevant phase features to profile control flow.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"448 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123276901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing transferability of adversarial examples against malware detection classifiers","authors":"Yixiang Wang, Jiqiang Liu, Xiaolin Chang","doi":"10.1145/3310273.3323072","DOIUrl":"https://doi.org/10.1145/3310273.3323072","url":null,"abstract":"Machine learning (ML) algorithms provide better performance than traditional algorithms in various applications. However, some unknown flaws in ML classifiers make them sensitive to adversarial examples generated by adding small but fooled purposeful distortions to natural examples. This paper aims to investigate the transferability of adversarial examples generated on a sparse and structured dataset and the ability of adversarial training in resisting adversarial examples. The results demonstrate that adversarial examples generated by DNN can fool a set of ML classifiers such as decision tree, random forest, SVM, CNN and RNN. Also, adversarial training can improve the robustness of DNN in terms of resisting attacks.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124097337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Capturing source code semantics via tree-based convolution over API-enhanced AST","authors":"Long Chen, Wei Ye, Shikun Zhang","doi":"10.1145/3310273.3321560","DOIUrl":"https://doi.org/10.1145/3310273.3321560","url":null,"abstract":"When deep learning meets big code, a key question is how to efficiently learn a distributed representation for source code that can capture its semantics effectively. We propose to use tree-based convolution over API-enhanced AST. To demonstrate the effectiveness of our approach, we apply it to detect semantic clones---code fragments with similar semantics but dissimilar syntax. Experiment results show that our approach outperforms an existing state-of-the-art approach that uses tree-based LSTM, with an increase of 0.39 and 0.12 in F1-score on OJClone and BigCloneBench respectively. We further propose architectures that incorporate our approach for code search and code summarization.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"20 3 Suppl 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132507084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}