{"title":"地理分布数据中心中的大数据放置策略","authors":"L. Bouhouch, M. Zbakh, C. Tadonki","doi":"10.1109/CloudTech49835.2020.9365881","DOIUrl":null,"url":null,"abstract":"With the pervasiveness of the \"Big Data\" characteristic together with the expansion of geographically distributed datacenters in the Cloud computing context, processing large- scale data applications has become a crucial issue. Indeed, the task of finding the most efficient way of storing massive data across distributed locations is increasingly complex. Furthermore, the execution time of a given task that requires several datasets might be dominated by the cost of data migrations/exchanges, which depends on the initial placement of the input datasets over the set of datacenters in the Cloud and also on the dynamic data management strategy. In this paper, we propose a data placement strategy to improve the workflow execution time through the reduction of the cost associated to data movements between geographically distributed datacenters, considering their characteristics such as storage capacity and read/write speeds. We formalize the overall problem and then propose a data placement algorithm structured into two phases. First, we compute the estimated transfer time to move all involved datasets from their respective locations to the one where the corresponding tasks are executed. Second, we apply a greedy algorithm in order to assign each dataset to the optimal datacenter w.r.t the overall cost of data migrations. The heterogeneity of the datacenters together with their characteristics (storage and bandwidth) are both taken into account. Our experiments are conducted using Cloudsim simulator. The obtained results show that our proposed strategy produces an efficient placement and actually reduces the overheads of the data movement compared to both a random assignment and a selected placement algorithm from the literature.","PeriodicalId":272860,"journal":{"name":"2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)","volume":"47 12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Big Data Placement Strategy in Geographically Distributed Datacenters\",\"authors\":\"L. Bouhouch, M. Zbakh, C. Tadonki\",\"doi\":\"10.1109/CloudTech49835.2020.9365881\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the pervasiveness of the \\\"Big Data\\\" characteristic together with the expansion of geographically distributed datacenters in the Cloud computing context, processing large- scale data applications has become a crucial issue. Indeed, the task of finding the most efficient way of storing massive data across distributed locations is increasingly complex. Furthermore, the execution time of a given task that requires several datasets might be dominated by the cost of data migrations/exchanges, which depends on the initial placement of the input datasets over the set of datacenters in the Cloud and also on the dynamic data management strategy. In this paper, we propose a data placement strategy to improve the workflow execution time through the reduction of the cost associated to data movements between geographically distributed datacenters, considering their characteristics such as storage capacity and read/write speeds. We formalize the overall problem and then propose a data placement algorithm structured into two phases. First, we compute the estimated transfer time to move all involved datasets from their respective locations to the one where the corresponding tasks are executed. Second, we apply a greedy algorithm in order to assign each dataset to the optimal datacenter w.r.t the overall cost of data migrations. The heterogeneity of the datacenters together with their characteristics (storage and bandwidth) are both taken into account. Our experiments are conducted using Cloudsim simulator. The obtained results show that our proposed strategy produces an efficient placement and actually reduces the overheads of the data movement compared to both a random assignment and a selected placement algorithm from the literature.\",\"PeriodicalId\":272860,\"journal\":{\"name\":\"2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)\",\"volume\":\"47 12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CloudTech49835.2020.9365881\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudTech49835.2020.9365881","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Big Data Placement Strategy in Geographically Distributed Datacenters
With the pervasiveness of the "Big Data" characteristic together with the expansion of geographically distributed datacenters in the Cloud computing context, processing large- scale data applications has become a crucial issue. Indeed, the task of finding the most efficient way of storing massive data across distributed locations is increasingly complex. Furthermore, the execution time of a given task that requires several datasets might be dominated by the cost of data migrations/exchanges, which depends on the initial placement of the input datasets over the set of datacenters in the Cloud and also on the dynamic data management strategy. In this paper, we propose a data placement strategy to improve the workflow execution time through the reduction of the cost associated to data movements between geographically distributed datacenters, considering their characteristics such as storage capacity and read/write speeds. We formalize the overall problem and then propose a data placement algorithm structured into two phases. First, we compute the estimated transfer time to move all involved datasets from their respective locations to the one where the corresponding tasks are executed. Second, we apply a greedy algorithm in order to assign each dataset to the optimal datacenter w.r.t the overall cost of data migrations. The heterogeneity of the datacenters together with their characteristics (storage and bandwidth) are both taken into account. Our experiments are conducted using Cloudsim simulator. The obtained results show that our proposed strategy produces an efficient placement and actually reduces the overheads of the data movement compared to both a random assignment and a selected placement algorithm from the literature.