Halima Bouzidi, Hamza Ouarnoughi, S. Niar, Abdessamad Ait El Cadi
{"title":"Performance prediction for convolutional neural networks on edge GPUs","authors":"Halima Bouzidi, Hamza Ouarnoughi, S. Niar, Abdessamad Ait El Cadi","doi":"10.1145/3457388.3458666","DOIUrl":"https://doi.org/10.1145/3457388.3458666","url":null,"abstract":"Edge computing is increasingly used for Artificial Intelligence (AI) purposes to meet latency, privacy, and energy challenges. Convolutional Neural networks (CNN) are more frequently deployed on Edge devices for several applications. However, due to their constrained computing resources and energy budget, Edge devices struggle to meet CNN's latency requirements while maintaining good accuracy. It is, therefore, crucial to choose the CNN with the best accuracy and latency trade-off while respecting hardware constraints. This paper presents and compares five of the widely used Machine Learning (ML) based approaches to predict CNN's inference execution time on Edge GPUs. For these 5 methods, in addition to their prediction accuracy, we also explore the time needed for their training and their hyperparameters' tuning. Finally, we compare times to run the prediction models on different platforms. The use of these methods will highly facilitate design space exploration by quickly providing the best CNN on a target Edge GPU. Experimental results show that XGBoost provides an interesting average prediction error even for unexplored and unseen CNN architectures. Random Forest depicts comparable accuracy but needs more effort and time to be trained. The other 3 approaches (OLS, MLP, and SVR) are less accurate for CNN performance estimation.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123846201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Santiago Rodrigo, Medina Bandic, S. Abadal, Hans van Someren, E. Alarcón, C. G. Almudever
{"title":"Scaling of multi-core quantum architectures: a communications-aware structured gap analysis","authors":"Santiago Rodrigo, Medina Bandic, S. Abadal, Hans van Someren, E. Alarcón, C. G. Almudever","doi":"10.1145/3457388.3458674","DOIUrl":"https://doi.org/10.1145/3457388.3458674","url":null,"abstract":"In the quest of large-scale quantum computers, multi-core distributed architectures are considered a compelling alternative to be explored. A crucial aspect in such approach is the stringent demand on communication among cores when qubits need to interact, which conditions the scalability potential of these architectures. In this work, we address the question of how the cost of the communication among cores impacts on the viability of the quantum multi-core approach. Methodologically, we consider a design space in which architectural variables (number of cores, number of qubits per core), application variables for several quantum benchmarks (number of qubits, number of gates, percentage of two-qubit gates) and inter-core communication latency are swept along with the definition of a figure of merit. This approach yields both a qualitative understanding of trends in the design space and companion dimensioning guidelines for the architecture, including optimal points, as well as quantitative answers to the question of beyond which communication performance levels the multi-core architecture pays off. Our results allow to determine the thresholds for inter-core communication latency in order for multi-core architectures to outperform single-core quantum processors.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116031849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Aldinucci, G. Agosta, A. Andreini, C. Ardagna, Andrea Bartolini, A. Cilardo, Biagio Cosenza, M. Danelutto, Roberto Esposito, W. Fornaciari, R. Giorgi, D. Lengani, R. Montella, M. Olivieri, S. Saponara, D. Simoni, M. Torquati
{"title":"The Italian research on HPC key technologies across EuroHPC","authors":"Marco Aldinucci, G. Agosta, A. Andreini, C. Ardagna, Andrea Bartolini, A. Cilardo, Biagio Cosenza, M. Danelutto, Roberto Esposito, W. Fornaciari, R. Giorgi, D. Lengani, R. Montella, M. Olivieri, S. Saponara, D. Simoni, M. Torquati","doi":"10.1145/3457388.3458508","DOIUrl":"https://doi.org/10.1145/3457388.3458508","url":null,"abstract":"High-Performance Computing (HPC) is one of the strategic priorities for research and innovation worldwide due to its relevance for industrial and scientific applications. We envision HPC as composed of three pillars: infrastructures, applications, and key technologies and tools. While infrastructures are by construction centralized in large-scale HPC centers, and applications are generally within the purview of domain-specific organizations, key technologies fall in an intermediate case where coordination is needed, but design and development are often decentralized. A large group of Italian researchers has started a dedicated laboratory within the National Interuniversity Consortium for Informatics (CINI) to address this challenge. The laboratory, albeit young, has managed to succeed in its first attempts to propose a coordinated approach to HPC research within the EuroHPC Joint Undertaking, participating in the calls 2019--20 to five successful proposals for an aggregate total cost of 95M€. In this paper, we outline the working group's scope and goals and provide an overview of the five funded projects, which become fully operational in March 2021, and cover a selection of key technologies provided by the working group partners, highlighting their usage development within the projects.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123556894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On resilience of security-oriented error detecting architectures against power attacks: a theoretical analysis","authors":"O. Keren, I. Polian","doi":"10.1145/3457388.3458867","DOIUrl":"https://doi.org/10.1145/3457388.3458867","url":null,"abstract":"It has been previously shown that hardware implementation of fault attack countermeasures based on error-detecting codes (EDCs) can make the circuit more vulnerable to power analysis attacks. We revisit this finding and show that the hypothesis space can grow significantly when a state-of-the-art security-oriented robust EDC is properly crafted. We use the Roth-Karp decomposition as an analytical tool to prove that by a simple re-ordering of the EDC's bits, the number of extra bits needed to formulate the hypotheses becomes so large that power analysis (that tries to exploit additional information from the redundant bits) is rendered infeasible.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128331141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Chazapis, Jean-Thomas Acquaviva, A. Bilas, G. Gardikis, C. Kozanitis, S. Louloudakis, H. Nguyen, Christian Pinto, A. Scharl, D. Soudris
{"title":"EVOLVE: HPC and cloud enhanced testbed for extracting value from large-scale diverse data","authors":"A. Chazapis, Jean-Thomas Acquaviva, A. Bilas, G. Gardikis, C. Kozanitis, S. Louloudakis, H. Nguyen, Christian Pinto, A. Scharl, D. Soudris","doi":"10.1145/3457388.3458621","DOIUrl":"https://doi.org/10.1145/3457388.3458621","url":null,"abstract":"EVOLVE is a pan-European Innovation Action building a converged infrastructure to bring together the HPC, Cloud, and Big Data worlds. EVOLVE's platform and software stack supports large-scale, data-intensive applications, driven primarily by industry requirements set by pilot and proof-of-concept use cases from diverse fields. Given the unprecedented data growth we are experiencing, EVOLVE's infrastructure is key in enabling the cost-effective processing of massive amounts of data and the adaptation of multiple high-end technologies, in an environment that fosters interoperability and enforces increased security.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124177119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Zhou, Hao Tian, Hong Zhang, Jin Zhang, M. Dong, Juncheng Jia
{"title":"TEA-fed: time-efficient asynchronous federated learning for edge computing","authors":"Chen Zhou, Hao Tian, Hong Zhang, Jin Zhang, M. Dong, Juncheng Jia","doi":"10.1145/3457388.3458655","DOIUrl":"https://doi.org/10.1145/3457388.3458655","url":null,"abstract":"Federated learning (FL) has attracted more and more attention recently. The integration of FL and edge computing makes the edge system more efficient and intelligent. FL usually uses the server to actively select certain edge devices to participate in the global model training. However, the selected edge devices may be stragglers, or even crash during training. Meanwhile, the unselected idle edge devices cannot be fully utilized for training. Therefore, besides the widely studied communication efficiency and data heterogeneity issues in FL, we also take the above time efficiency into consideration, and propose a time-efficient asynchronous federated learning protocol, TEA-Fed, to solve these problems. With TEA-Fed, idle edge devices actively apply for training tasks and participate in model training asynchronously once assigned tasks. Considering that there may be a huge number of edge devices in edge computing, we introduce control parameters to limit the number of devices participating in training the identical model at the same time. Meanwhile, we also introduce caching mechanism and weighted averaging with respect to model staleness in the model aggregation step to reduce the adverse effects of model staleness and further improve the accuracy of the global model. Finally, the experimental results show that the protocol can accelerate the convergence of model training, improve the accuracy, and has robustness to heterogeneous data.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132759401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Poona Bahrebar, Leon Denis, Maxim Bonnaerens, Kristof Coddens, J. Dambre, W. Favoreel, I. Khvastunov, A. Munteanu, Hung Nguyen-Duc, S. Schulte, D. Stroobandt, Ramses Valvekens, N. V. D. Broeck, Geert Verbruggen
{"title":"cREAtIve: reconfigurable embedded artificial intelligence","authors":"Poona Bahrebar, Leon Denis, Maxim Bonnaerens, Kristof Coddens, J. Dambre, W. Favoreel, I. Khvastunov, A. Munteanu, Hung Nguyen-Duc, S. Schulte, D. Stroobandt, Ramses Valvekens, N. V. D. Broeck, Geert Verbruggen","doi":"10.1145/3457388.3458857","DOIUrl":"https://doi.org/10.1145/3457388.3458857","url":null,"abstract":"cREAtIve targets the development of novel highly-adaptable embedded deep learning solutions for automotive and traffic monitoring applications, including position sensor processing, scene interpretation based on LiDAR, and object detection and classification in thermal images for traffic camera systems. These applications share the need for deep learning solutions tailored for deployment on embedded devices with limited resources and featuring high adaptability and robustness to changing environmental conditions. cREAtIve develops knowledge, tools and methods that enable hardware-efficient, adaptable, and robust deep learning.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116169404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent UAV-aided controller placement scheme for software-defined vehicular networks","authors":"Na Lin, Qi Zhao, Liang Zhao","doi":"10.1145/3457388.3458809","DOIUrl":"https://doi.org/10.1145/3457388.3458809","url":null,"abstract":"Recently, researchers have used long short-term memory (LSTM) networks and the bi-directional long short-term memory (Bi-LSTM) networks to process sequence data sets such as vehicle positions in software-defined vehicular networks (SDVN). In this paper, we present a three-component intelligent UAV-aided controller placement scheme (CPP) for SDVN. First, we use Bi-LSTM to model the real-time position of vehicles (traffic flow). Second, we implement a dynamic scheme to place controllers and UAVs (DCUPE) in the network based on the predicted flow. Third, in order to collect real-time traffic information and manage the network, we compute trajectories for the UAVs from real-time Bi-LSTM predictions of vehicle positions and an adaptive artificial bee colony algorithm for the traveling salesman problem (IDABC-TSP). We evaluate our proposed design as a function of energy cost, communication delay, and packet delivery ratio. Our experimental results show the effectiveness of our scheme on real geographical topologies.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114806509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua Klein, A. Levisse, G. Ansaloni, David Atienza Alonso, Marina Zapater, M. Dazzi, G. Karunaratne, I. Boybat, A. Sebastian, D. Rossi, Francesco Conti, Elana Pereira de Santana, P. Bolívar, M. Saeed, R. Negra, Zhenxing Wang, Kun-Ta Wang, M. Lemme, Akshay Jain, Robert Guirado, H. Taghvaee, S. Abadal
{"title":"Architecting more than Moore: wireless plasticity for massive heterogeneous computer architectures (WiPLASH)","authors":"Joshua Klein, A. Levisse, G. Ansaloni, David Atienza Alonso, Marina Zapater, M. Dazzi, G. Karunaratne, I. Boybat, A. Sebastian, D. Rossi, Francesco Conti, Elana Pereira de Santana, P. Bolívar, M. Saeed, R. Negra, Zhenxing Wang, Kun-Ta Wang, M. Lemme, Akshay Jain, Robert Guirado, H. Taghvaee, S. Abadal","doi":"10.1145/3457388.3458859","DOIUrl":"https://doi.org/10.1145/3457388.3458859","url":null,"abstract":"This paper presents the research directions pursued by the WiPLASH European project, pioneering on-chip wireless communications as a disruptive enabler towards next-generation computing systems for artificial intelligence (AI). We illustrate the holistic approach driving our research efforts, which encompass expertises and abstraction levels ranging from physical design of embedded graphene antennas to system-level evaluation of wirelessly-communicating heterogeneous systems.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130814926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Daghero, Chenhao Xie, D. J. Pagliari, A. Burrello, Marco Castellano, Luca Gandolfi, A. Calimera, E. Macii, M. Poncino
{"title":"Ultra-compact binary neural networks for human activity recognition on RISC-V processors","authors":"Francesco Daghero, Chenhao Xie, D. J. Pagliari, A. Burrello, Marco Castellano, Luca Gandolfi, A. Calimera, E. Macii, M. Poncino","doi":"10.1145/3457388.3458656","DOIUrl":"https://doi.org/10.1145/3457388.3458656","url":null,"abstract":"Human Activity Recognition (HAR) is a relevant inference task in many mobile applications. State-of-the-art HAR at the edge is typically achieved with lightweight machine learning models such as decision trees and Random Forests (RFs), whereas deep learning is less common due to its high computational complexity. In this work, we propose a novel implementation of HAR based on deep neural networks, and precisely on Binary Neural Networks (BNNs), targeting low-power general purpose processors with a RISC-V instruction set. BNNs yield very small memory footprints and low inference complexity, thanks to the replacement of arithmetic operations with bit-wise ones. However, existing BNN implementations on general purpose processors impose constraints tailored to complex computer vision tasks, which result in over-parametrized models for simpler problems like HAR. Therefore, we also introduce a new BNN inference library, which targets ultra-compact models explicitly. With experiments on a single-core RISC-V processor, we show that BNNs trained on two HAR datasets obtain higher classification accuracy compared to a state-of-the-art baseline based on RFs. Furthermore, our BNN reaches the same accuracy of a RF with either less memory (up to 91%) or more energy-efficiency (up to 70%), depending on the complexity of the features extracted by the RF.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121303955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}