Shuanggen Liu , Xiayi Zhou , Xu An Wang , Zixuan Yan , He Yan , Yurui Cao
{"title":"A hash-based post-quantum ring signature scheme for the Internet of Vehicles","authors":"Shuanggen Liu , Xiayi Zhou , Xu An Wang , Zixuan Yan , He Yan , Yurui Cao","doi":"10.1016/j.sysarc.2025.103345","DOIUrl":"10.1016/j.sysarc.2025.103345","url":null,"abstract":"<div><div>With the rapid development of the Internet of Vehicles, securing data transmission has become crucial, especially given the threat posed by quantum computing to traditional digital signatures. This paper presents a hash-based post-quantum ring signature scheme built upon the XMSS hash-based signature framework, leveraging Merkle trees for efficient data organization and verification. In addition, the scheme is applied to the Internet of Vehicles, ensuring both anonymity and traceability while providing robust quantum-resistant security. Evaluation results indicate that, compared to other schemes, the proposed method achieves superior verification speed while ensuring data security and privacy.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103345"},"PeriodicalIF":3.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143130297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Component-based architectural regression test selection for modularized software systems","authors":"Mohammed Al-Refai , Mahmoud M. Hammad","doi":"10.1016/j.sysarc.2025.103343","DOIUrl":"10.1016/j.sysarc.2025.103343","url":null,"abstract":"<div><div>Regression testing is an essential part of software development, but it can be costly and require significant computational resources. Regression Test Selection (RTS) improves regression testing efficiency by only re-executing the tests that have been affected by code changes. Recently, dynamic and static RTS techniques for Java projects showed that selecting tests at a coarser granularity, class-level, is more effective than selecting tests at a finer granularity, method- or statement-level. However, prior techniques are mainly considering Java object-oriented projects but not modularized Java projects. Given the explicit support of architectural constructs introduced by the <em>Java Platform Module System (JPMS)</em> in the ninth edition of Java, these research efforts are not customized for component-based Java projects. To that end, we propose two static component-based RTS approaches called CORTS and its variant C2RTS tailored for component-based Java software systems. CORTS leverages the architectural information such as components and ports, specified in the module descriptor files, to construct module-level dependency graph and identify relevant tests. The variant, C2RTS, is a hybrid approach in which it integrates analysis at both the module and class levels, employing module descriptor files and compile-time information to construct the dependency graph and identify relevant tests.</div><div>We evaluated CORTS and C2RTS on 1200 revisions of 12 real-world open source software systems, and compared the results with those of class-level dynamic (Ekstazi) and static (STARTS) RTS approaches. The results showed that CORTS and C2RTS outperformed the static class-level RTS in terms of safety violation that measures to what extent an RTS technique misses test cases that should be selected. Using Ekstazi as the baseline, the average safety violation with respect to Ekstazi was 1.14% for CORTS, 2.21% for C2RTS, and 3.19% for STARTS. On the other hand, the results showed that CORTS and C2RTS selected more test cases than Ekstazi and STARTS. The average reduction in test suite size was 22.78% for CORTS and 43.47% for C2RTS comparing to the 68.48% for STARTS and 84.21% for Ekstazi. For all the studied subjects, CORTS and C2RTS reduced the size of the static dependency graphs compared to those generated by static class-level RTS, leading to faster graph construction and analysis for test case selection. Additionally, CORTS and C2RTS achieved reductions in overall end-to-end regression testing time compared to the retest-all strategy.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103343"},"PeriodicalIF":3.7,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143130300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient string solver for string constraints with regex-counting and string-length","authors":"Denghang Hu, Zhilin Wu","doi":"10.1016/j.sysarc.2025.103340","DOIUrl":"10.1016/j.sysarc.2025.103340","url":null,"abstract":"<div><div>Regular expressions (regex for short) and string-length function are widely used in string-manipulating programs. Counting is a frequently used feature in regexes that counts the number of matchings of sub-patterns. The state-of-the-art string solvers are incapable of solving string constraints with regex-counting and string-length efficiently, especially when the counting and length bounds are large. In this work, we propose an automata-theoretic approach for solving such class of string constraints. The main idea is to symbolically model the counting operators by registers in automata instead of unfolding them explicitly, thus alleviating the state explosion problem. Moreover, the string-length function is modeled by a register as well. As a result, the satisfiability of string constraints with regex-counting and string-length is reduced to the satisfiability of linear integer arithmetic, which the off-the-shelf SMT solvers can then solve. To improve the performance further, we also propose various optimization techniques. We implement the algorithms and validate our approach on 49,843 benchmark instances. The experimental results show that our approach can solve more instances than the state-of-the-art solvers, at a comparable or faster speed, especially when the counting and length bounds are large or when the counting operators are nested with some other counting operators or complement operators.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103340"},"PeriodicalIF":3.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143237932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Zhang , Lu Lu , Zhanyu Yang , Zhihong Liang , Siliang Suo
{"title":"LE-GEMM: A lightweight emulation-based GEMM with precision refinement on GPU","authors":"Yu Zhang , Lu Lu , Zhanyu Yang , Zhihong Liang , Siliang Suo","doi":"10.1016/j.sysarc.2025.103336","DOIUrl":"10.1016/j.sysarc.2025.103336","url":null,"abstract":"<div><div>Many special hardware units, such as Matrix Core and Tensor Core, have recently been designed and applied in various scientific computing scenarios. These units support tensor-level computation with different precisions on GPU. Previous studies have proposed methods for computing single-precision GEneral Matrix Multiplication (GEMM) with the half-precision matrix. However, this routine often leads to some loss of accuracy, which limits its application. This paper proposed a Lightweight Emulation-based GEMM (LE-GEMM) on GPU that includes a lightweight emulation algorithm, a thread parallelism analytic model, and an efficient multi-level pipeline implementation to accelerate the computation process without compromising the accuracy requirements. First, we propose a lightweight emulation algorithm that includes a precision transformation process and GEMM emulation calculation to achieve better computational accuracy and performance. Secondly, a thread parallel analytic model is designed to analyze and guide the selection of the optimal tiling scheme based on various computing scenarios and hardware. Thirdly, an efficient multi-level pipeline is implemented, which can maximize instruction-level parallelism and latency hiding. Several comparison experiments were conducted on two commonly used GPU platforms: AMD-platform and NVIDIA-platform. The experimental results show that the proposed method outperforms the previous approaches in terms of computational accuracy and speed.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103336"},"PeriodicalIF":3.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143130296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaodong Yang , Xilai Luo , Zefan Liao , Wenjia Wang , Xiaoni Du , Shudong Li
{"title":"A CP-ABE-based access control scheme with cryptographic reverse firewall for IoV","authors":"Xiaodong Yang , Xilai Luo , Zefan Liao , Wenjia Wang , Xiaoni Du , Shudong Li","doi":"10.1016/j.sysarc.2025.103331","DOIUrl":"10.1016/j.sysarc.2025.103331","url":null,"abstract":"<div><div>The convergence of AI and internet technologies has sparked significant interest in the Internet of Vehicles (IoV) and intelligent transportation systems (ITS). However, the vast data generated within these systems poses challenges for onboard terminals and secure data sharing. To address these issues, we propose a novel solution combining ciphertext policy attribute-based encryption (CP-ABE) and a cryptographic reverse firewall (CRF) mechanism for IoV. This approach offers several advantages, including offline encryption and outsourced decryption to improve efficiency. The CRF mechanism adds an extra layer of security by re-randomizing vehicle data, protecting sensitive information. While single-attribute authority schemes simplify access control, they are not ideal for IoV environments. Therefore, we introduce a multi-authority scheme to enhance security. Performance analysis demonstrates our scheme’s ability to optimize encryption and decryption while safeguarding vehicle data confidentiality. In summary, our solution improves data management, access control, and security in the IoV, contributing to its safe and efficient development.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103331"},"PeriodicalIF":3.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143130295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"REC: Enhancing fine-grained cache coherence protocol in multi-GPU systems","authors":"Gun Ko, Jiwon Lee, Hongju Kal, Hyunwuk Lee, Won Woo Ro","doi":"10.1016/j.sysarc.2025.103339","DOIUrl":"10.1016/j.sysarc.2025.103339","url":null,"abstract":"<div><div>With the increasing demands of modern workloads, multi-GPU systems have emerged as a scalable solution, extending performance beyond the capabilities of single GPUs. However, these systems face significant challenges in managing memory across multiple GPUs, particularly due to the Non-Uniform Memory Access (NUMA) effect, which introduces latency penalties when accessing remote memory. To mitigate NUMA overheads, GPUs typically cache remote memory accesses across multiple levels of the cache hierarchy, which are kept coherent using cache coherence protocols. The traditional GPU bulk-synchronous programming (BSP) model relies on coarse-grained invalidations and cache flushes at kernel boundaries, which are insufficient for the fine-grained communication patterns required by emerging applications. In multi-GPU systems, where NUMA is a major bottleneck, substantial data movement resulting from the bulk cache invalidations exacerbates performance overheads. Recent cache coherence protocol for multi-GPUs enables flexible data sharing through coherence directories that track shared data at a fine-grained level across GPUs. However, these directories limited in capacity, leading to frequent evictions and unnecessary invalidations, which increase cache misses and degrade performance. To address these challenges, we propose REC, a low-cost architectural solution that enhances the effective tracking capacity of coherence directories by leveraging memory access locality. REC coalesces multiple tag addresses from remote read requests within common address ranges, reducing directory storage overhead while maintaining fine-grained coherence for writes. Our evaluation on a 4-GPU system shows that REC reduces L2 cache misses by 53.5% and improves overall system performance by 32.7% across a variety of GPU workloads.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103339"},"PeriodicalIF":3.7,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143130301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ChipAI: A scalable chiplet-based accelerator for efficient DNN inference using silicon photonics","authors":"Hao Zhang , Haibo Zhang , Zhiyi Huang , Yawen Chen","doi":"10.1016/j.sysarc.2024.103308","DOIUrl":"10.1016/j.sysarc.2024.103308","url":null,"abstract":"<div><div>To enhance the precision of inference, deep neural network (DNN) models have been progressively growing in scale and complexity, leading to increased latency and computational resource demands. This growth necessitates scalable architectures, such as chiplet-based accelerators, to accommodate the substantial volume of deep learning inference tasks. However, the efficiency, energy consumption, and scalability of existing accelerators are severely constrained by metallic interconnects. Photonic interconnects, on the contrary, offer a promising alternative, with their advantages of low latency, high bandwidth, high energy efficiency, and simplified communication processes. In this paper, we propose ChipAI, an accelerator designed based on photonic interconnects for accelerating DNN inference tasks. ChipAI implements an efficient hybrid optical network that supports effective inter-chiplet and intra-chiplet data sharing, thereby enhancing parallel processing capabilities. Additionally, we propose a flexible dataflow leveraging the ChipAI architecture and the characteristics of DNN models, facilitating efficient architectural mapping of DNN layers. Simulation on various DNN models demonstrates that, compared to the state-of-the-art chiplet-based DNN accelerator with photonic interconnects, ChipAI can reduce the DNN inference time and energy consumption by up to 82% and 79%, respectively.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"158 ","pages":"Article 103308"},"PeriodicalIF":3.7,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-interactive set intersection for privacy-preserving contact tracing","authors":"Axin Wu , Yuer Yang , Jinghang Wen , Yu Zhang , Qiuxia Zhao","doi":"10.1016/j.sysarc.2024.103307","DOIUrl":"10.1016/j.sysarc.2024.103307","url":null,"abstract":"<div><div>Contact tracing (CT) is an effective method to combat the spread of infectious diseases like COVID-19, by notifying and alerting individuals who have been in contact with infected patients. One simple yet practical approach for implementing CT functionality is to directly publish the travel history and locations visited by infected users. However, this approach compromises the location privacy and makes infected individuals reluctant to participate in such systems. Private set intersection (PSI) is a promising candidate to protect the privacy of participants. But, interactive PSI protocols may not be friendly for querists with limited resources due to high local computation costs and communication bandwidth requirements. Additionally, concerns about identity leakage may result in infected users missing or providing erroneous information about their visited locations. To address the above issues, we propose a cloud-assisted non-interactive framework for privacy-preserving CT, which enables querists to obtain query results without multi-round interaction and addresses concerns regarding location and identity information leakage. Its core building block is a cloud-assisted non-interactive set intersection protocol, skillfully transformed from anonymous broadcast encryption (AnoBE). To our knowledge, this is the first derivation from AnoBE. We also instantiate the proposed framework and thoroughly evaluate its performance, demonstrating its efficiency.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"158 ","pages":"Article 103307"},"PeriodicalIF":3.7,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey of Edge Caching Security: Framework, Methods, and Challenges","authors":"Hang Zhang, Jinsong Wang, Zening Zhao, Zhao Zhao","doi":"10.1016/j.sysarc.2024.103306","DOIUrl":"10.1016/j.sysarc.2024.103306","url":null,"abstract":"<div><div>Edge caching reduces the frequent communication between users and remote cloud servers by caching popular content at the network edge, which can decrease response latency and improve the user service experience. However, the openness and vulnerability of edge caching introduce several security risks. The existing research work on edge caching security only focuses on certain specific aspects and does not consider edge caching security from a global perspective. Therefore, the paper provides a comprehensive review of edge caching security in order to accelerate the development of related research areas. Specifically, this paper first introduces the traditional and extended models of edge caching, the threats to edge caching, and the key metrics for implementing edge caching security. Then, we propose a comprehensive security framework of edge caching that considers content request security, content transmission security, content caching security, and multi-party trusted collaboration. Moreover, the four aspects of security framework are respectively discussed in detail, aiming to achieve security protection for edge caching. Finally, a discussion is provided on the shortcomings of current edge caching security and potential future directions.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"158 ","pages":"Article 103306"},"PeriodicalIF":3.7,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NLTSP: A cost model for tensor program tuning using nested loop trees","authors":"Xinghe Qin , Yunchun Li , Fengxu Lin , Wei Li","doi":"10.1016/j.sysarc.2024.103305","DOIUrl":"10.1016/j.sysarc.2024.103305","url":null,"abstract":"<div><div>This paper introduces NLTSP, a deep learning-based cost model designed to optimize tensor program performance in deep learning compilers. NLTSP, short for Nested Loop Tree Structure Processing, facilitates tensor program tuning by extracting information directly from the nested loop tree structure of sampled programs. NLTSP extracts features upstream in the compilation flow and eliminates the need for complex feature engineering. By utilizing a unified format for CPU and GPU architectures and extracting simple high-level features, NLTSP significantly accelerates feature extraction speed while maintaining performance accuracy. We have integrated this technology into Ansor, a leading search framework in the TVM compiler, and conducted experiments. Compared with TenSet MLP, the state-of-the-art cost model utilizing Ansor features as inputs, NLTSP achieves feature extraction speeds on CPU and GPU that are, on average, 97.9 times and 41.4 times faster, respectively, and can reduce the average search time for CPU and GPU workloads by 2.50 times and 4.11 times, respectively. It is worth noting that NLTSP is not specifically designed for Ansor. Any auto-tuning framework capable of representing scheduled tensor programs as nested loop trees can potentially benefit from using NLTSP to achieve superior performance. The code is available at <span><span>https://github.com/xhq0/NLTSP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"158 ","pages":"Article 103305"},"PeriodicalIF":3.7,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142700481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}