竞争感知GPU线程块调度高效GPU- ssd

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters Pub Date : 2025-07-07 DOI:10.1109/LCA.2025.3586312

Xueyang Liu;Seonjin Na;Euijun Chung;Jiashen Cao;Jing Yang;Hyesoon Kim

{"title":"竞争感知GPU线程块调度高效GPU- ssd","authors":"Xueyang Liu;Seonjin Na;Euijun Chung;Jiashen Cao;Jing Yang;Hyesoon Kim","doi":"10.1109/LCA.2025.3586312","DOIUrl":null,"url":null,"abstract":"The growing dataset sizes in LLM have made low-cost SSDs a popular solution for extending GPU memory in mobile devices. In this paper, we introduce <monospace>CA-Scheduler</monospace>, a contention-aware scheduling scheme for GPU-initiated SSD access. The key insight behind <monospace>CA-Scheduler</monospace> is leveraging the BSP GPU programming model, which allows reordering work at the thread block level to optimize SSD throughput. By capitalizing on the predictable memory access patterns of GPU thread blocks, <monospace>CA-Scheduler</monospace> anticipates SSD locations to minimize contention and improve performance.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"257-260"},"PeriodicalIF":1.4000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD\",\"authors\":\"Xueyang Liu;Seonjin Na;Euijun Chung;Jiashen Cao;Jing Yang;Hyesoon Kim\",\"doi\":\"10.1109/LCA.2025.3586312\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing dataset sizes in LLM have made low-cost SSDs a popular solution for extending GPU memory in mobile devices. In this paper, we introduce <monospace>CA-Scheduler</monospace>, a contention-aware scheduling scheme for GPU-initiated SSD access. The key insight behind <monospace>CA-Scheduler</monospace> is leveraging the BSP GPU programming model, which allows reordering work at the thread block level to optimize SSD throughput. By capitalizing on the predictable memory access patterns of GPU thread blocks, <monospace>CA-Scheduler</monospace> anticipates SSD locations to minimize contention and improve performance.\",\"PeriodicalId\":51248,\"journal\":{\"name\":\"IEEE Computer Architecture Letters\",\"volume\":\"24 2\",\"pages\":\"257-260\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Computer Architecture Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11072283/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Architecture Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11072283/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

LLM中不断增长的数据集大小使得低成本ssd成为移动设备中扩展GPU内存的流行解决方案。在本文中，我们介绍了CA-Scheduler，一个竞争感知的调度方案，用于gpu发起的SSD访问。CA-Scheduler背后的关键洞察是利用BSP GPU编程模型，该模型允许在线程块级别重新排序工作以优化SSD吞吐量。通过利用GPU线程块的可预测内存访问模式，CA-Scheduler可以预测SSD位置，从而最小化争用并提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD

The growing dataset sizes in LLM have made low-cost SSDs a popular solution for extending GPU memory in mobile devices. In this paper, we introduce CA-Scheduler, a contention-aware scheduling scheme for GPU-initiated SSD access. The key insight behind CA-Scheduler is leveraging the BSP GPU programming model, which allows reordering work at the thread block level to optimize SSD throughput. By capitalizing on the predictable memory access patterns of GPU thread blocks, CA-Scheduler anticipates SSD locations to minimize contention and improve performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.