Journal of Systems Architecture最新文献

筛选
英文 中文
Managing real-time constraints through monitoring and analysis-driven edge orchestration 通过监控和分析驱动的边缘协调来管理实时限制因素
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-04 DOI: 10.1016/j.sysarc.2025.103403
Daniel Casini , Paolo Pazzaglia , Matthias Becker
{"title":"Managing real-time constraints through monitoring and analysis-driven edge orchestration","authors":"Daniel Casini ,&nbsp;Paolo Pazzaglia ,&nbsp;Matthias Becker","doi":"10.1016/j.sysarc.2025.103403","DOIUrl":"10.1016/j.sysarc.2025.103403","url":null,"abstract":"<div><div>Emerging real-time applications are increasingly moving to distributed heterogeneous platforms, under the promise of more powerful and flexible resource capabilities. This shift inevitably brings new challenges. The design space to deploy chains of threads is more complex, and sound estimates of worst-case execution times are harder to obtain. Additionally, the environment is more dynamic, requiring additional runtime flexibility on the part of the application itself. In this paper, we present an optimization-based approach to this problem. First, we present a model and real-time analysis for modern distributed edge applications. Second, we propose a design-time optimization problem to show how to set the main parameters characterizing such applications from a time-predictability perspective. Then, we present an orchestration and runtime decision-making mechanism that monitors execution times and allows for runtime reconfigurations, spanning from graceful degradation policies to re-distributions of workload. A prototypical implementation of the proposed approach based on the QNX RTOS and its evaluation on a realistic case study based on an edge-based valet parking application conclude the paper.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"163 ","pages":"Article 103403"},"PeriodicalIF":3.7,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143776864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic review of multi-factor authentication in digital payment systems: NIST standards alignment and industry implementation analysis
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-03-26 DOI: 10.1016/j.sysarc.2025.103402
Phat T. Tran-Truong , Minh Q. Pham , Ha X. Son , Dat L.T. Nguyen , Minh B. Nguyen , Khiem L. Tran , Loc C.P. Van , Kiet T. Le , Khanh H. Vo , Ngan N.T. Kim , Triet M. Nguyen , Anh T. Nguyen
{"title":"A systematic review of multi-factor authentication in digital payment systems: NIST standards alignment and industry implementation analysis","authors":"Phat T. Tran-Truong ,&nbsp;Minh Q. Pham ,&nbsp;Ha X. Son ,&nbsp;Dat L.T. Nguyen ,&nbsp;Minh B. Nguyen ,&nbsp;Khiem L. Tran ,&nbsp;Loc C.P. Van ,&nbsp;Kiet T. Le ,&nbsp;Khanh H. Vo ,&nbsp;Ngan N.T. Kim ,&nbsp;Triet M. Nguyen ,&nbsp;Anh T. Nguyen","doi":"10.1016/j.sysarc.2025.103402","DOIUrl":"10.1016/j.sysarc.2025.103402","url":null,"abstract":"<div><div>This survey presents a systematic evaluation of Multi-Factor Authentication (MFA) practices in digital payment systems, analyzing their alignment with NIST Special Publications 800-63 guidelines. Through a comprehensive review of 70 academic papers published between 2017–2024 and 13 industry-based authentication tools, we examine how current implementations measure against Identity Assurance Level (IAL) and Authentication Assurance Level (AAL) standards. Our analysis reveals a significant gap between theoretical capabilities proposed in academic research and actual industry implementations, with 33% of tools relying primarily on OTP-based authentication despite more advanced methods being available. The survey identifies emerging trends like biometric authentication adoption (60% of analyzed papers) and varying regulatory compliance across sectors, with payment systems demonstrating 77% alignment with standards while IoT and E-Service domains show fragmented approaches. We propose a framework for developing adaptive authentication systems that balance security requirements with user experience through context-aware risk assessment. This work provides valuable insights for researchers, practitioners, and policymakers working to enhance the security and usability of digital payment authentication systems.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"162 ","pages":"Article 103402"},"PeriodicalIF":3.7,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143738977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint optimization of layering and power allocation for scalable VR video in 6G networks based on Deep Reinforcement Learning
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-03-17 DOI: 10.1016/j.sysarc.2025.103401
Junchao Yang , Hui Zhang , Wenxin Jiao , Zhiwei Guo , Fayez Alqahtani , Amr Tolba , Yu Shen
{"title":"Joint optimization of layering and power allocation for scalable VR video in 6G networks based on Deep Reinforcement Learning","authors":"Junchao Yang ,&nbsp;Hui Zhang ,&nbsp;Wenxin Jiao ,&nbsp;Zhiwei Guo ,&nbsp;Fayez Alqahtani ,&nbsp;Amr Tolba ,&nbsp;Yu Shen","doi":"10.1016/j.sysarc.2025.103401","DOIUrl":"10.1016/j.sysarc.2025.103401","url":null,"abstract":"<div><div>With the advancement and application of virtual reality (VR) technology, there is a growing demand for network bandwidth and computational capabilities. To address the challenges of high bandwidth requirements, low latency demands, and intensive computational tasks in VR video transmission, this paper proposes a joint optimization method for layering and power allocation based on Deep Reinforcement Learning (DRL). The method focuses on the transmission of scalable VR videos in 6G networks, utilizing DRL to achieve a cloud-edge-end collaborative transmission framework, where Tile-based scalable VR video is proactively cached to the MEC nodes, and Asynchronous Advantage Actor-Critic (A3C) algorithm is adopted to jointly optimize dual-connected link resources, edge computing resources, and user terminal computing resources. Through simulation experiments, the effectiveness of the proposed algorithms was validated. The results show that compared to baseline algorithms and state-of-the-art methods, the proposed A3C algorithm effectively improves the average quality of experience (QoE) for VR users and maintains low latency under various sub-6G and millimeter wave link capacities. Furthermore, with increased Mobile Edge Computing (MEC) computing power and User Equipment (UE) computing capabilities, the proposed method can further improve QoE and reduce latency.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"162 ","pages":"Article 103401"},"PeriodicalIF":3.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143687768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Verifiable decentralized identity-based meta-computing in Industrial Internet of Things (IIoT)
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-03-13 DOI: 10.1016/j.sysarc.2025.103391
Kai Ding , Tianxiu Xie , Keke Gai , Chennan Guo , Liangqi Lei , Dongjue Wang , Jing Yu , Liehuang Zhu , Weizhi Meng
{"title":"Verifiable decentralized identity-based meta-computing in Industrial Internet of Things (IIoT)","authors":"Kai Ding ,&nbsp;Tianxiu Xie ,&nbsp;Keke Gai ,&nbsp;Chennan Guo ,&nbsp;Liangqi Lei ,&nbsp;Dongjue Wang ,&nbsp;Jing Yu ,&nbsp;Liehuang Zhu ,&nbsp;Weizhi Meng","doi":"10.1016/j.sysarc.2025.103391","DOIUrl":"10.1016/j.sysarc.2025.103391","url":null,"abstract":"<div><div>Meta-computing in Industrial Internet of Things (IIoT) has triggered a dramatic advance due to the gigantic supports of computation power for processing complex IIoT tasks. However, users identities are encountering security and verification issues since emerging threats derive from dynamic inter-operations in the cross-organization context. Even though blockchain-based Decentralized Identity (DID) is an alternative for offering a strengthened identity governance, current verifiability of DID documents still encounters vulnerabilities due to the involvement of the less trustful third parties that maintain the storage of binding relationships between DID identifiers and public keys. In this paper, we propose a novel <em><u>V</u>erifiable and <u>S</u>earchable <u>D</u>ecentralized <u>I</u>dentity (VS-DID)</em> model. We focus on the verifiability of DID documents and propose a verifiable registry scheme that ensures verifiable binding relationships. In order to enable efficient queries in large-scale users’ identities in meta-computing IIoT, we develop an on-chain-off-chain query strategy that adopts a slide window accumulator. The experimental results show that our scheme reduces aggregate proof time and commitment time by 93.5% and 96.5%, respectively, compared to the Merkle SNARK scheme, while maintaining reasonable verification time, significantly improving the efficiency of DID registry in large-scale IIoT environments.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"162 ","pages":"Article 103391"},"PeriodicalIF":3.7,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143687769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing I/O and wear-out distribution inside SSDs with optimized cache management
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-03-11 DOI: 10.1016/j.sysarc.2025.103392
Jiaxu Wu, Jiaojiao Wu, Aobo Yang, Fan Yang, Zhigang Cai, Jianwei Liao
{"title":"Balancing I/O and wear-out distribution inside SSDs with optimized cache management","authors":"Jiaxu Wu,&nbsp;Jiaojiao Wu,&nbsp;Aobo Yang,&nbsp;Fan Yang,&nbsp;Zhigang Cai,&nbsp;Jianwei Liao","doi":"10.1016/j.sysarc.2025.103392","DOIUrl":"10.1016/j.sysarc.2025.103392","url":null,"abstract":"<div><div>NAND flash memory-based solid-state drives (SSDs) have been adopted as storage infrastructure in a wide range of computing systems. In order to service an I/O request, the logical page address (<em>LPA</em>) of the request should be mapped to a physical page address (<em>PPA</em>), termed page-level address mapping in SSDs. As a fundamental mapping scheme, static mapping needs a small-scale mapping table and ensures good read parallelism, but it may bring about uneven I/O and wear-out distribution across SSD parallel units (<em>e.g.</em> flash planes), thus resulting in low write efficiency. To mitigate the negative effects of static mapping, this paper proposes a novel cache management scheme to not only guarantee I/O responsiveness but also balance I/O and wear-out distribution. Specifically, we first introduce directly flushing a portion of data pages onto the flash array while they are cold and the target parallel units have endured a small number of erase operations. After that, we present a method for selecting victim data pages from the data cache, by referring to the factors of pending I/O requests and the wear-out level on the flash memory. Through a series of simulation experiments on selected block I/O traces of real-world applications, we show that our approach achieves an average I/O latency reduction of <span>16.1</span>% compared to <em>Baseline</em>, <span>13.6</span>% over <em>GCaR</em>, <span>12.4</span>% over <em>LCR</em>, and <span>6.6</span>% over <em>ARB</em> while simultaneously balancing I/O and wear-out distribution. These results demonstrate its superiority over existing state-of-the-art schemes.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"162 ","pages":"Article 103392"},"PeriodicalIF":3.7,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143621500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subversion resistant identity-based signature
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-03-08 DOI: 10.1016/j.sysarc.2025.103385
Mengdi Ouyang , Cuixiang Yang , Xiaojuan Liao , Fagen Li
{"title":"Subversion resistant identity-based signature","authors":"Mengdi Ouyang ,&nbsp;Cuixiang Yang ,&nbsp;Xiaojuan Liao ,&nbsp;Fagen Li","doi":"10.1016/j.sysarc.2025.103385","DOIUrl":"10.1016/j.sysarc.2025.103385","url":null,"abstract":"<div><div>Identity-based cryptography (IBC) resolves the issue of certificate management, establishing itself as an evolving industry standard. Identity-based signature (IBS), an essential element of IBC, ensures integrity and authentication, playing a crucial role in the domains of internet of things (IoT) and cloud computing. Nevertheless, the “Snowden” event exposed how attackers subverted cryptographic algorithms’ implementations to undermine security and conduct mass-surveillance. We explore a subversion attack (SA) model on IBS and define two properties including undetectability and strong key recoverability. Our SA enables a recovery of the master private key and a private key through any two successive signatures, posing a greater challenge. Cryptographic reverse firewalls (RFs) are the main countermeasures to resist SAs. However, existing works necessitate the storage of randomness corresponding to various identities and fail to resist bit-by-bit SA. To address the aforementioned issue, we formulate a system model and a security model for subversion-resistant identity-based signature (SR-IBS). Then, we establish an instance and prove SR-IBS’s security of existential unforgeability under chosen message attack (EUF-CMA) along with subversion resistance. Finally, we leverage pypbc library to conduct a comprehensive experiment analysis. The results indicate the execution difference between subverted IBS and pure one is around 2ms and RFs only add approximately 0.5% of overall execution across five different security level. SR-IBS provides subversion-resistant without increasing high computation burden.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"162 ","pages":"Article 103385"},"PeriodicalIF":3.7,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143621501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A continuous leakage-resilient CCA secure identity-based key encapsulation mechanism in the standard model
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-03-05 DOI: 10.1016/j.sysarc.2025.103388
Zirui Qiao , Yasi Zhu , Yanwei Zhou , Bo Yang
{"title":"A continuous leakage-resilient CCA secure identity-based key encapsulation mechanism in the standard model","authors":"Zirui Qiao ,&nbsp;Yasi Zhu ,&nbsp;Yanwei Zhou ,&nbsp;Bo Yang","doi":"10.1016/j.sysarc.2025.103388","DOIUrl":"10.1016/j.sysarc.2025.103388","url":null,"abstract":"<div><div>In practical scenarios, attackers can exploit leakage assaults such as fault analysis and cold boot attacks to extract sensitive data, including secret keys, from cryptographic primitives. As a result, cryptographic primitives are deemed secure under traditional ideal security models and may no longer provide the expected level of security. To meet the demand for leakage resilience in identity-based key encapsulation mechanisms (IB-KEM), we introduce a concrete structure of a leakage-resilient IB-KEM, and its resistance to chosen-ciphertext attacks (CCA) is grounded in the decisional bilinear Diffie–Hellman hypothesis. Furthermore, to enhance the practicality of our IB-KEM construction, we investigate its continuous leakage resilience. We achieve protection against ongoing leakage attacks by implementing regular updates of user keys. Our analysis and comparisons with existing works demonstrate that our proposal can effectively mitigate leakage attacks and maintain high computational efficiency while ensuring CCA security. Additionally, leveraging the enhanced scalability provided by hierarchical identities, we explore the leakage resilience of the hierarchical identity-based key encapsulation mechanism (HIB-KEM). We propose a general formation of a leakage-tolerant (hierarchical) identity-based encryption scheme based on (H)IB-KEM, further underscoring the high availability of (H)IB-KEM.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"162 ","pages":"Article 103388"},"PeriodicalIF":3.7,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143580760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AP-LET: Enabling deterministic Pub/Sub communication in AUTOSAR Adaptive
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-03-04 DOI: 10.1016/j.sysarc.2025.103390
Davide Bellassai , Claudio Scordino , Daniel Casini , Alessandro Biondi
{"title":"AP-LET: Enabling deterministic Pub/Sub communication in AUTOSAR Adaptive","authors":"Davide Bellassai ,&nbsp;Claudio Scordino ,&nbsp;Daniel Casini ,&nbsp;Alessandro Biondi","doi":"10.1016/j.sysarc.2025.103390","DOIUrl":"10.1016/j.sysarc.2025.103390","url":null,"abstract":"<div><div>The automotive software industry is facing a paradigm shift driven by the need to develop more and more advanced functionality distributed on multiple electronic control units. The AUTOSAR Adaptive standard has been designed as a service-oriented architecture on top of a general-purpose operating system to tackle this paradigm shift.</div><div>Nevertheless, it does not provide means to ensure deterministic communication, as required in safety-related components.</div><div>This paper studies the integration of the System-Level Logical Execution Time (SL-LET) paradigm in AUTOSAR Adaptive.</div><div>The key design challenges and requirements to support SL-LET in AUTOSAR Adaptive are described, highlighting how to overcome the considerable differences between the AUTOSAR Classic and Adaptive domains. Then, a meta-protocol named AP-LET is presented, together with two concrete instances: one based on high-priority tasks and another leveraging timestamps in the message payload to handle communications and ensure determinism. A complete implementation of both protocols is also described. AP-LET was finally evaluated with a realistic automotive application, showing its feasibility and effectiveness.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"162 ","pages":"Article 103390"},"PeriodicalIF":3.7,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143580759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DynaNet: A dynamic BFT consensus framework
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-03-04 DOI: 10.1016/j.sysarc.2025.103386
Jiarui Li , Guoyan Zhang , Duan Liu , Fei Zhou , Leibo Li , Weizhi Meng
{"title":"DynaNet: A dynamic BFT consensus framework","authors":"Jiarui Li ,&nbsp;Guoyan Zhang ,&nbsp;Duan Liu ,&nbsp;Fei Zhou ,&nbsp;Leibo Li ,&nbsp;Weizhi Meng","doi":"10.1016/j.sysarc.2025.103386","DOIUrl":"10.1016/j.sysarc.2025.103386","url":null,"abstract":"<div><div>In a blockchain system, allowing nodes to change during runtime through consensus protocols is an effective way to enhance the scalability and security of the blockchain. However, some limitations of the traditional membership change methods prevent them from meeting the continuously expanding scale of blockchain networks. The paper proposes a membership change framework based on byzantine fault tolerance (BFT) consensus: DynaNet. Compared to similar membership change methods, DynaNet can make any number of members join and leave the system with just one consensus round and offer tremendous flexibility: it can be integrated with most synchronous BFT consensus protocols, endowing it with membership change capabilities. We showcase its integration with HotStuff: D-HotStuff and prove its safety and liveness. The experiment shows that D-HotStuff performs well in latency and throughput, and its multi-node change mode is more practical.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"162 ","pages":"Article 103386"},"PeriodicalIF":3.7,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond VABlock: Improving Transformer workloads through aggressive prefetching
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-03-04 DOI: 10.1016/j.sysarc.2025.103389
Jane Rhee , Ikyoung Choi , Gunjae Koo , Yunho Oh , Myung Kuk Yoon
{"title":"Beyond VABlock: Improving Transformer workloads through aggressive prefetching","authors":"Jane Rhee ,&nbsp;Ikyoung Choi ,&nbsp;Gunjae Koo ,&nbsp;Yunho Oh ,&nbsp;Myung Kuk Yoon","doi":"10.1016/j.sysarc.2025.103389","DOIUrl":"10.1016/j.sysarc.2025.103389","url":null,"abstract":"<div><div>The memory capacity constraint of GPUs is a major challenge in running large deep learning workloads with their ever increasing memory requirements. To run a large Transformer model with limited GPU memory, programmers need to manually allocate and copy data between CPU and GPUs. This programming burden is eased by Unified Virtual Memory (UVM), which automatically manages data transfer through its demand paging scheme. However, using UVM can cause performance degradation, especially under memory oversubscription. In this paper, we analyze the memory behavior of inference in large Transformer models using real hardware and the open-source NVIDIA UVM driver. The default Tree-Based Neighborhood (TBN) prefetcher in the UVM driver supports page prefetching within a 2MB virtual address block (VABlock), but it only detects locality within a VABlock, limiting its effectiveness for large models. Our analysis reveals that this locality extends beyond the VABlock, which the default prefetcher cannot exploit. To address this, we propose a block-aware prefetcher that prefetches multiple contiguous VABlocks with greater aggressiveness. Our evaluation shows that this approach delivers an average 2.7x performance improvement over the default TBN prefetcher when GPU memory is oversubscribed.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"162 ","pages":"Article 103389"},"PeriodicalIF":3.7,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143610428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信