{"title":"面向混合云的数据重力和遵从意识分布式深度学习","authors":"Avinash Maurya, Jaiaid Mobin, M. M. Rafique","doi":"10.1109/HiPCW57629.2022.00012","DOIUrl":null,"url":null,"abstract":"To store large volumes of data concurrently from a diverse set of sources, data stores such as data silos, lakes, and warehouses, have been widely embraced by various organizations. Thanks to data fabric architectures, such scattered data (both structurally and geographically), can be accessed transparently at scale while adhering to various administrative regulations (e.g. governance, privacy, compliance, etc.). However, modern workload schedulers and distributed deep learning (DDL) runtimes are oblivious to the uneven data distribution across different storage services and compliance regulations, leading to sub-optimal resource utilization and training completion times. Al-though state-of-art workflow schedulers such as Apache Hadoop Yarn, Horovod, etc. exploit data locality, they require application developers to explicitly map data and resources available across various cloud services during job submission. These approaches are redundant and counterproductive for next-generation data fabric architectures that feature automated transparency and compliance abstractions for accessing disparate data sources with uneven data distribution. To this end, we propose an algorithm based on greedy programming that leverages the meta-data catalog of data fabric to efficiently determine training schedules based on data gravity, compliance, and resource availability. Our simulations based on synthetic data and resource distribution profiles demonstrate significant improvements in execution times and resource utilization compared to traditional DDL scheduling approaches in hybrid multi-cloud environments.","PeriodicalId":432185,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Data Gravity and Compliance Aware Distributed Deep Learning on Hybrid Clouds\",\"authors\":\"Avinash Maurya, Jaiaid Mobin, M. M. Rafique\",\"doi\":\"10.1109/HiPCW57629.2022.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To store large volumes of data concurrently from a diverse set of sources, data stores such as data silos, lakes, and warehouses, have been widely embraced by various organizations. Thanks to data fabric architectures, such scattered data (both structurally and geographically), can be accessed transparently at scale while adhering to various administrative regulations (e.g. governance, privacy, compliance, etc.). However, modern workload schedulers and distributed deep learning (DDL) runtimes are oblivious to the uneven data distribution across different storage services and compliance regulations, leading to sub-optimal resource utilization and training completion times. Al-though state-of-art workflow schedulers such as Apache Hadoop Yarn, Horovod, etc. exploit data locality, they require application developers to explicitly map data and resources available across various cloud services during job submission. These approaches are redundant and counterproductive for next-generation data fabric architectures that feature automated transparency and compliance abstractions for accessing disparate data sources with uneven data distribution. To this end, we propose an algorithm based on greedy programming that leverages the meta-data catalog of data fabric to efficiently determine training schedules based on data gravity, compliance, and resource availability. Our simulations based on synthetic data and resource distribution profiles demonstrate significant improvements in execution times and resource utilization compared to traditional DDL scheduling approaches in hybrid multi-cloud environments.\",\"PeriodicalId\":432185,\"journal\":{\"name\":\"2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPCW57629.2022.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 29th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPCW57629.2022.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Data Gravity and Compliance Aware Distributed Deep Learning on Hybrid Clouds
To store large volumes of data concurrently from a diverse set of sources, data stores such as data silos, lakes, and warehouses, have been widely embraced by various organizations. Thanks to data fabric architectures, such scattered data (both structurally and geographically), can be accessed transparently at scale while adhering to various administrative regulations (e.g. governance, privacy, compliance, etc.). However, modern workload schedulers and distributed deep learning (DDL) runtimes are oblivious to the uneven data distribution across different storage services and compliance regulations, leading to sub-optimal resource utilization and training completion times. Al-though state-of-art workflow schedulers such as Apache Hadoop Yarn, Horovod, etc. exploit data locality, they require application developers to explicitly map data and resources available across various cloud services during job submission. These approaches are redundant and counterproductive for next-generation data fabric architectures that feature automated transparency and compliance abstractions for accessing disparate data sources with uneven data distribution. To this end, we propose an algorithm based on greedy programming that leverages the meta-data catalog of data fabric to efficiently determine training schedules based on data gravity, compliance, and resource availability. Our simulations based on synthetic data and resource distribution profiles demonstrate significant improvements in execution times and resource utilization compared to traditional DDL scheduling approaches in hybrid multi-cloud environments.