面向混合云的数据重力和遵从意识分布式深度学习

2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW) Pub Date : 2022-12-01 DOI:10.1109/HiPCW57629.2022.00012

Avinash Maurya, Jaiaid Mobin, M. M. Rafique

{"title":"面向混合云的数据重力和遵从意识分布式深度学习","authors":"Avinash Maurya, Jaiaid Mobin, M. M. Rafique","doi":"10.1109/HiPCW57629.2022.00012","DOIUrl":null,"url":null,"abstract":"To store large volumes of data concurrently from a diverse set of sources, data stores such as data silos, lakes, and warehouses, have been widely embraced by various organizations. Thanks to data fabric architectures, such scattered data (both structurally and geographically), can be accessed transparently at scale while adhering to various administrative regulations (e.g. governance, privacy, compliance, etc.). However, modern workload schedulers and distributed deep learning (DDL) runtimes are oblivious to the uneven data distribution across different storage services and compliance regulations, leading to sub-optimal resource utilization and training completion times. Al-though state-of-art workflow schedulers such as Apache Hadoop Yarn, Horovod, etc. exploit data locality, they require application developers to explicitly map data and resources available across various cloud services during job submission. These approaches are redundant and counterproductive for next-generation data fabric architectures that feature automated transparency and compliance abstractions for accessing disparate data sources with uneven data distribution. To this end, we propose an algorithm based on greedy programming that leverages the meta-data catalog of data fabric to efficiently determine training schedules based on data gravity, compliance, and resource availability. Our simulations based on synthetic data and resource distribution profiles demonstrate significant improvements in execution times and resource utilization compared to traditional DDL scheduling approaches in hybrid multi-cloud environments.","PeriodicalId":432185,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Data Gravity and Compliance Aware Distributed Deep Learning on Hybrid Clouds\",\"authors\":\"Avinash Maurya, Jaiaid Mobin, M. M. Rafique\",\"doi\":\"10.1109/HiPCW57629.2022.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To store large volumes of data concurrently from a diverse set of sources, data stores such as data silos, lakes, and warehouses, have been widely embraced by various organizations. Thanks to data fabric architectures, such scattered data (both structurally and geographically), can be accessed transparently at scale while adhering to various administrative regulations (e.g. governance, privacy, compliance, etc.). However, modern workload schedulers and distributed deep learning (DDL) runtimes are oblivious to the uneven data distribution across different storage services and compliance regulations, leading to sub-optimal resource utilization and training completion times. Al-though state-of-art workflow schedulers such as Apache Hadoop Yarn, Horovod, etc. exploit data locality, they require application developers to explicitly map data and resources available across various cloud services during job submission. These approaches are redundant and counterproductive for next-generation data fabric architectures that feature automated transparency and compliance abstractions for accessing disparate data sources with uneven data distribution. To this end, we propose an algorithm based on greedy programming that leverages the meta-data catalog of data fabric to efficiently determine training schedules based on data gravity, compliance, and resource availability. Our simulations based on synthetic data and resource distribution profiles demonstrate significant improvements in execution times and resource utilization compared to traditional DDL scheduling approaches in hybrid multi-cloud environments.\",\"PeriodicalId\":432185,\"journal\":{\"name\":\"2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPCW57629.2022.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPCW57629.2022.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了同时存储来自不同来源的大量数据，数据存储(如数据筒仓、数据湖和数据仓库)已被各种组织广泛采用。由于数据结构架构，这种分散的数据(结构上和地理上)可以在遵守各种管理法规(例如治理、隐私、合规性等)的情况下透明地大规模访问。然而，现代工作负载调度器和分布式深度学习(DDL)运行时忽略了不同存储服务和遵从性法规之间不均匀的数据分布，从而导致次优的资源利用率和训练完成时间。尽管最先进的工作流调度器(如Apache Hadoop Yarn、Horovod等)利用了数据局部性，但它们要求应用程序开发人员在作业提交期间显式地映射不同云服务之间可用的数据和资源。这些方法对于下一代数据结构体系结构来说是多余的，而且适得其反，因为下一代数据结构体系结构具有访问数据分布不均匀的不同数据源的自动化透明性和遵从性抽象。为此，我们提出了一种基于贪心编程的算法，该算法利用数据结构的元数据目录来有效地确定基于数据重力、遵从性和资源可用性的训练计划。我们基于合成数据和资源分布配置文件的模拟表明，与混合多云环境中的传统DDL调度方法相比，在执行时间和资源利用率方面有了显着改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Data Gravity and Compliance Aware Distributed Deep Learning on Hybrid Clouds

To store large volumes of data concurrently from a diverse set of sources, data stores such as data silos, lakes, and warehouses, have been widely embraced by various organizations. Thanks to data fabric architectures, such scattered data (both structurally and geographically), can be accessed transparently at scale while adhering to various administrative regulations (e.g. governance, privacy, compliance, etc.). However, modern workload schedulers and distributed deep learning (DDL) runtimes are oblivious to the uneven data distribution across different storage services and compliance regulations, leading to sub-optimal resource utilization and training completion times. Al-though state-of-art workflow schedulers such as Apache Hadoop Yarn, Horovod, etc. exploit data locality, they require application developers to explicitly map data and resources available across various cloud services during job submission. These approaches are redundant and counterproductive for next-generation data fabric architectures that feature automated transparency and compliance abstractions for accessing disparate data sources with uneven data distribution. To this end, we propose an algorithm based on greedy programming that leverages the meta-data catalog of data fabric to efficiently determine training schedules based on data gravity, compliance, and resource availability. Our simulations based on synthetic data and resource distribution profiles demonstrate significant improvements in execution times and resource utilization compared to traditional DDL scheduling approaches in hybrid multi-cloud environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)

自引率

0.00%

发文量