Menghong Guan , Haiyong Bao , Zhiqiang Li , Hao Pan , Cheng Huang , Hong-Ning Dai
{"title":"SAMFL: Secure Aggregation Mechanism for Federated Learning with Byzantine-robustness by functional encryption","authors":"Menghong Guan , Haiyong Bao , Zhiqiang Li , Hao Pan , Cheng Huang , Hong-Ning Dai","doi":"10.1016/j.sysarc.2024.103304","DOIUrl":"10.1016/j.sysarc.2024.103304","url":null,"abstract":"<div><div>Federated learning (FL) enables collaborative model training without sharing private data, thereby potentially meeting the growing demand for data privacy protection. Despite its potentials, FL also poses challenges in achieving privacy-preservation and Byzantine-robustness when handling sensitive data. To address these challenges, we present a novel <strong>S</strong>ecure <strong>A</strong>ggregation <strong>M</strong>echanism for <strong>F</strong>ederated <strong>L</strong>earning with Byzantine-Robustness by Functional Encryption (SAMFL). Our approach designs a novel dual-decryption multi-input functional encryption (DD-MIFE) scheme, which enables efficient computation of cosine similarities and aggregation of encrypted gradients through a single ciphertext. This innovative scheme allows for dual decryption, producing distinct results based on different keys, while maintaining high efficiency. We further propose TF-Init, integrating DD-MIFE with Truth Discovery (TD) to eliminate the reliance on a root dataset. Additionally, we devise a secure cosine similarity calculation aggregation protocol (SC2AP) using DD-MIFE, ensuring privacy-preserving and Byzantine-robust FL secure aggregation. To enhance FL efficiency, we employ single instruction multiple data (SIMD) to parallelize encryption and decryption processes. Concurrently, to preserve accuracy, we incorporate differential privacy (DP) with selective clipping of model layers within the FL framework. Finally, we integrate TF-Init, SC2AP, SIMD, and DP to construct SAMFL. Extensive experiments demonstrate that SAMFL successfully defends against both inference attacks and poisoning attacks, while improving efficiency and accuracy compared to existing methods. SAMFL provides a comprehensive integrated solution for FL with efficiency, accuracy, privacy-preservation, and robustness.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"157 ","pages":"Article 103304"},"PeriodicalIF":3.7,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renping Liu , Xiao Xiao , Yang Zou , Peng Chen , Linbo Long , Anping Xiong , Duo Liu
{"title":"ZNS-Cleaner: Enhancing lifespan by reducing empty erase in ZNS SSDs","authors":"Renping Liu , Xiao Xiao , Yang Zou , Peng Chen , Linbo Long , Anping Xiong , Duo Liu","doi":"10.1016/j.sysarc.2024.103303","DOIUrl":"10.1016/j.sysarc.2024.103303","url":null,"abstract":"<div><div>The Zoned Namespace (ZNS) interface shifts data management responsibility to upper-level applications, which reclaims space by sending the zone-reset command to ZNS SSD devices. Due to a semantic barrier between upper-level applications and ZNS SSD devices, these applications struggle to understand the state of the flash blocks within the devices and arbitrarily send zone-reset commands to the devices. This results in significant empty pages being erased (empty erase), accelerating flash block aging and shortening the lifespan of ZNS SSDs.</div><div>To solve this problem, we decouple the zone-reset command at the upper-level applications from the erase operation at the devices, and propose ZNS-Cleaner to erase the flash blocks in ZNS SSDs. ZNS-Cleaner autonomously determines the timing of erasing, rather than relying on the zone-reset command. To fully use the empty pages, ZNS-Cleaner divides the storage space into page-level strips, and adopts these strips to reconstruct a new zone at runtime. Comprehensive evaluations show that ZNS-Cleaner reduces the empty erase by 87.2%, lowers the total erase count by 51.0%, decreases the max erase count of flash blocks up to <span><math><mrow><mn>10</mn><mo>×</mo></mrow></math></span> and prolongs the lifespan <span><math><mrow><mn>5</mn><mo>.</mo><mn>2</mn><mo>×</mo></mrow></math></span> averagely in ZNS SSDs.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"157 ","pages":"Article 103303"},"PeriodicalIF":3.7,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iosu Gomez , Unai Díaz de Cerio , Jorge Parra , Juan M. Rivas , J. Javier Gutiérrez , Michael González Harbour
{"title":"Using MAST for modeling and response-time analysis of real-time applications with GPUs","authors":"Iosu Gomez , Unai Díaz de Cerio , Jorge Parra , Juan M. Rivas , J. Javier Gutiérrez , Michael González Harbour","doi":"10.1016/j.sysarc.2024.103300","DOIUrl":"10.1016/j.sysarc.2024.103300","url":null,"abstract":"<div><div>The ever increasing computing demands in embedded systems is driving the adoption of hardware accelerators such as GPUs, which offer powerful platforms that can compute parallel workloads efficiently. Relevant critical applications that benefit from such platforms, for instance autonomous driving, usually impose additional real-time requirements that must be met to guarantee the correctness of the systems. In this paper, we propose exploiting readily available and extensively validated techniques to model and analyze real-time systems with GPUs. Specifically, we propose a methodology to employ the MAST model to characterize such systems, and different variants of the Offset-Based Response-Time Analysis techniques to validate the real-time requirements. We verify our approach with a real industrial application sourced from the railway industry. Through a comprehensive evaluation involving synthetic and real task-sets, we characterize the applicability of the approach, and we also show how estimated worst-case response times are aligned with real measurements up to 87.2%.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"157 ","pages":"Article 103300"},"PeriodicalIF":3.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yamilka Toca-Díaz, Rubén Gran Tejero, Alejandro Valero
{"title":"Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN accelerators","authors":"Yamilka Toca-Díaz, Rubén Gran Tejero, Alejandro Valero","doi":"10.1016/j.sysarc.2024.103292","DOIUrl":"10.1016/j.sysarc.2024.103292","url":null,"abstract":"<div><div>Underscaling the supply voltage (<span><math><msub><mrow><mi>V</mi></mrow><mrow><mi>d</mi><mi>d</mi></mrow></msub></math></span>) to ultra-low levels below the safe-operation threshold voltage (<span><math><msub><mrow><mi>V</mi></mrow><mrow><mi>m</mi><mi>i</mi><mi>n</mi></mrow></msub></math></span>) holds promise for substantial power savings in digital CMOS circuits. However, these benefits come with pronounced challenges due to the heightened risk of bitcell permanent faults stemming from process variations in current technology node sizes.</div><div>This work delves into the repercussions of such faults on the accuracy of a 16-bit fixed-point Convolutional Neural Network (CNN) inference accelerator powering on-chip activation memories at ultra-low <span><math><msub><mrow><mi>V</mi></mrow><mrow><mi>d</mi><mi>d</mi></mrow></msub></math></span> voltages. Through an in-depth examination of fault patterns, memory usage, and statistical analysis of activation values, this paper introduces Shift-and-Safe: two novel and cost-effective microarchitectural techniques exploiting the presence of outlier activation values and the underutilization of activation memories. Particularly, activation outliers enable a shift-based data representation that reduces the impact of faults on the activation values, whereas the memory underutilization is exploited to maintain a safe replica of affected activations in idle memory regions. Remarkably, these mechanisms do not add any burden to the programmer and are independent of application characteristics, rendering them easily deployable across real-world CNN accelerators.</div><div>Experimental results show that Shift-and-Safe maintains the CNN accuracy even in the presence of almost a quarter of the total activations with faults. In addition, average energy savings are by 5% and 11% compared to the state-of-the-art approach and a conventional accelerator supplied at <span><math><msub><mrow><mi>V</mi></mrow><mrow><mi>m</mi><mi>i</mi><mi>n</mi></mrow></msub></math></span>, respectively.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"157 ","pages":"Article 103292"},"PeriodicalIF":3.7,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Function Placement Approaches in Serverless Computing: A Survey","authors":"Mohsen Ghorbian, Mostafa Ghobaei-Arani, Rohollah Asadolahpour-Karimi","doi":"10.1016/j.sysarc.2024.103291","DOIUrl":"10.1016/j.sysarc.2024.103291","url":null,"abstract":"<div><div>Serverless computing is a new paradigm computing in cloud computing that allows developers to focus on code development without the need to manage infrastructure and enjoy the benefits of automatic scaling and low costs. The function placement mechanism is a critical concept in serverless computing that refers to choosing the optimal place for executing functions to improve the efficiency of resources and reduce the delay in executing functions. However, this process faces challenges such as the complexity of dynamic environments, heterogeneous resources, variable execution costs, and changes in the timing of requests, which make it challenging to choose the appropriate location for functions. This article provides a comprehensive overview of function placement mechanisms in serverless computing. It aims to introduce a comprehensive and systematic classification of critical approaches such as machine learning (ML)-based, heuristic-based, and model-based that are used in implementing function placement. Also, by examining each approach's strengths and weaknesses, this review article helps researchers and developers find a better perspective on the existing solutions and approaches and avoid repeated efforts by comprehensively reviewing previous research. In addition, by identifying research gaps and introducing new paths, this research provides the basis for improving future research.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"157 ","pages":"Article 103291"},"PeriodicalIF":3.7,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implications of architecture and implementation choices on timing analysis of automotive CAN networks","authors":"Dongwen Yang , Marco Di Natale , Haibo Zeng","doi":"10.1016/j.sysarc.2024.103290","DOIUrl":"10.1016/j.sysarc.2024.103290","url":null,"abstract":"<div><div>The Controller Area Network (CAN) protocol is widely adopted in industry such as automotive. It has been analyzed in a number of studies in the real-time systems community to compute the worst case response time of messages, based on an ideal model on the behavior of the message queuing and CAN controller. Recently, more results are being added to study systems that are not constructed according to the ideal model.</div><div>In this paper, we further investigate the architecture choices that may affect the timing analysis. Specifically, we provide an assessment on the practical relevance of several analysis results. We also present theory and empirical studies on the relative importance of architecture implementation issues that are quite common in real systems but further deviate from the ideal behavior. We also experimentally evaluate the response time while using TxObjects without preemption. We propose a heuristic for the design of multiple software queues when TxObjects are not preemptable. Finally, we derive an upper bound on the worst case response time when message output at the CAN driver is polling based.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"157 ","pages":"Article 103290"},"PeriodicalIF":3.7,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florian Meisel, Christoph Spang, David Volz, Andreas Koch
{"title":"TaPaFuzz: Hardware-accelerated RISC-V bare-metal firmware fuzzing using rapid job launches","authors":"Florian Meisel, Christoph Spang, David Volz, Andreas Koch","doi":"10.1016/j.sysarc.2024.103288","DOIUrl":"10.1016/j.sysarc.2024.103288","url":null,"abstract":"<div><div>Fuzz testing serves as a key technique in software security aimed at identifying unexpected program behaviors by repeatedly executing the target program with auto-generated random inputs. Testing is integral to IoT device security but is hampered by the minimal observability features of typical in-market IoT devices. Moreover, the slow nature of a RISC-V software emulation on x86 host CPUs and the inaccuracies introduced by compiling IoT applications to a different ISA for execution on host systems pose significant challenges. Our software-hardware co-design surmounts these hurdles. Fuzzing jobs are prepared and evaluated on a host computer, while the actual execution with high-throughput tracing is performed on an FPGA. Advances in the host-to-FPGA interface together with an accelerated reset procedure between Fuzzer jobs effectively hide the costly host-FPGA communication, increasing the single-thread fuzzing performance by up to factor 11.7x that of the leading QEMU-based fuzzer AFL++ running on a very fast x86 CPU. We demonstrate practical usability by evaluating our framework on a collection of applications in a bare-metal environment.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"156 ","pages":"Article 103288"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Verifiable privacy-preserving semantic retrieval scheme in the edge computing","authors":"Jiaqi Guo , Cong Tian , Qiang He , Liang Zhao , Zhenhua Duan","doi":"10.1016/j.sysarc.2024.103289","DOIUrl":"10.1016/j.sysarc.2024.103289","url":null,"abstract":"<div><div>Edge computing, with its characteristics of low latency and low transmission costs, addresses the storage and computation challenges arising from the surge in network edge traffic. It enables users to leverage nearby edge servers for data outsourcing and retrieval. However, data outsourcing poses risks to data privacy. Although searchable encryption is proposed to secure search of outsourced data, existing schemes generally cannot meet the requirements of semantic search, and they also exhibit security risks and incur high search costs. In addition, edge servers may engage in malicious activities such as data tampering or forgery. Therefore, we propose a verifiable privacy-preserving semantic retrieval scheme named VPSR suitable for edge computing environments. We utilize the Doc2Vec method to extract text feature vectors and then convert them into matrix form to reduce storage space requirements for indexes, queries, and keys. We encrypt matrices using an improved secure k-nearest neighbor (kNN) algorithm based on learning with errors (LWE) and calculate text similarity by solving the Hadamard product between matrices. Additionally, we design an aggregable signature scheme and offload part of the result verification tasks to edge servers. Security and performance analysis results demonstrate that the VPSR scheme is suitable for edge computing environments with high encryption and search efficiency and low storage cost while ensuring security.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"156 ","pages":"Article 103289"},"PeriodicalIF":3.7,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The rCOS framework for multi-dimensional separation of concerns in model-driven engineering","authors":"Bo Liu , Shmuel Tyszberowicz , Zhiming Liu","doi":"10.1016/j.sysarc.2024.103287","DOIUrl":"10.1016/j.sysarc.2024.103287","url":null,"abstract":"<div><div>The software industry increasingly turns to Model-Driven Engineering (MDE) to mitigate complexity by automating model creation and transformation. Many organisations are pursuing Integrated Development Platforms (IDPs) to enhance automation in their software development processes within MDE. However, the adoption of MDE and engagement with IDPs remain limited due to concerns over their efficacy. We address these challenges in this review paper by introducing a framework for the formal refinement of component and object systems (rCOS). It provides: (1) a formal theory that consists of a modelling language (named <em>OPL</em>) with a calculus of refinement for object-oriented models and component models; (2) a suite of analysis and design techniques that facilitate abstractions and decompositions, leading to a multidimensional separation of concerns; and (3) an IDP (named <em>rCOS Modeller</em>) that supports modelling, design and verification from requirements elicitation through to coding. By advocating for an rCOS-enabled multidimensional approach to separating concerns, this paper offers a comprehensive solution to the challenges facing MDE and IDPs, paving the way for their successful implementation in practice. By delineating the emerging challenges and prospects associated with integrating formal methods for modelling and designing human-cyber–physical systems (HCPS), we show the potential of extending rCOS for MDE in HCPS.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"156 ","pages":"Article 103287"},"PeriodicalIF":3.7,"publicationDate":"2024-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142418576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the degree of parallelism for parallel real-time tasks","authors":"Qingqiang He , Nan Guan , Zhe Jiang , Mingsong Lv","doi":"10.1016/j.sysarc.2024.103286","DOIUrl":"10.1016/j.sysarc.2024.103286","url":null,"abstract":"<div><div>The degree of parallelism, which measures how a task can execute concurrently, is an important characterization in scheduling. This paper studies the degree of parallelism in the domain of real-time scheduling of parallel tasks, including the DAG task model and the conditional DAG task model. The definition of the degree of parallelism for DAG tasks is clarified; the definition and computing algorithm of the degree of parallelism for conditional DAG tasks are proposed. By leveraging the degree of parallelism, new response time bounds are derived and simple but effective real-time scheduling approaches are presented. This research is the first work to study the degree of parallelism for conditional DAG tasks and explore its benefits in real-time scheduling. Experimental results demonstrate that the proposed scheduling approaches significantly outperform existing state-of-the-art methods.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"156 ","pages":"Article 103286"},"PeriodicalIF":3.7,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142418577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}