Privacy-preserving Job Scheduler for GPU Sharing

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW) Pub Date : 2023-05-01 DOI:10.1109/CCGridW59191.2023.00077

Aritra Ray, Kyle Lafata, Zhaobo Zhang, Ying Xiong, K. Chakrabarty

引用次数: 1

Abstract

Machine learning (ML) training jobs are resource intensive. High infrastructure costs of computing clusters encourage multi-tenancy in GPU resources. This invites a scheduling problem in assigning multiple ML training jobs on a single GPU while minimizing task interference. Our paper introduces a clustering-based privacy-preserving job scheduler that minimizes task interference without accessing sensitive user data. We perform ML workload characterization, made available publicly [1], and do exploratory data analysis to cluster ML workloads. Consequently, we build a knowledge base of inter and intra-cluster task interference to minimize task interference.

查看原文本刊更多论文

保护隐私的GPU共享作业调度器

机器学习(ML)培训工作是资源密集型的。计算集群的高基础设施成本鼓励GPU资源的多租户。这将导致在单个GPU上分配多个ML训练任务时出现调度问题，同时最大限度地减少任务干扰。本文介绍了一种基于聚类的隐私保护作业调度器，该调度器在不访问敏感用户数据的情况下最大限度地减少了任务干扰。我们执行ML工作负载表征，公开[1]，并对集群ML工作负载进行探索性数据分析。因此，我们建立了集群间和集群内任务干扰的知识库，以最大限度地减少任务干扰。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)

自引率

0.00%

发文量