Xueyang Liu;Seonjin Na;Euijun Chung;Jiashen Cao;Jing Yang;Hyesoon Kim
{"title":"竞争感知GPU线程块调度高效GPU- ssd","authors":"Xueyang Liu;Seonjin Na;Euijun Chung;Jiashen Cao;Jing Yang;Hyesoon Kim","doi":"10.1109/LCA.2025.3586312","DOIUrl":null,"url":null,"abstract":"The growing dataset sizes in LLM have made low-cost SSDs a popular solution for extending GPU memory in mobile devices. In this paper, we introduce <monospace>CA-Scheduler</monospace>, a contention-aware scheduling scheme for GPU-initiated SSD access. The key insight behind <monospace>CA-Scheduler</monospace> is leveraging the BSP GPU programming model, which allows reordering work at the thread block level to optimize SSD throughput. By capitalizing on the predictable memory access patterns of GPU thread blocks, <monospace>CA-Scheduler</monospace> anticipates SSD locations to minimize contention and improve performance.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"257-260"},"PeriodicalIF":1.4000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD\",\"authors\":\"Xueyang Liu;Seonjin Na;Euijun Chung;Jiashen Cao;Jing Yang;Hyesoon Kim\",\"doi\":\"10.1109/LCA.2025.3586312\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing dataset sizes in LLM have made low-cost SSDs a popular solution for extending GPU memory in mobile devices. In this paper, we introduce <monospace>CA-Scheduler</monospace>, a contention-aware scheduling scheme for GPU-initiated SSD access. The key insight behind <monospace>CA-Scheduler</monospace> is leveraging the BSP GPU programming model, which allows reordering work at the thread block level to optimize SSD throughput. By capitalizing on the predictable memory access patterns of GPU thread blocks, <monospace>CA-Scheduler</monospace> anticipates SSD locations to minimize contention and improve performance.\",\"PeriodicalId\":51248,\"journal\":{\"name\":\"IEEE Computer Architecture Letters\",\"volume\":\"24 2\",\"pages\":\"257-260\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Computer Architecture Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11072283/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Architecture Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11072283/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD
The growing dataset sizes in LLM have made low-cost SSDs a popular solution for extending GPU memory in mobile devices. In this paper, we introduce CA-Scheduler, a contention-aware scheduling scheme for GPU-initiated SSD access. The key insight behind CA-Scheduler is leveraging the BSP GPU programming model, which allows reordering work at the thread block level to optimize SSD throughput. By capitalizing on the predictable memory access patterns of GPU thread blocks, CA-Scheduler anticipates SSD locations to minimize contention and improve performance.
期刊介绍:
IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.