Redefining Data Locality for Cross-Data Center Storage

Proceedings of the 2nd International Workshop on Software-Defined Ecosystems Pub Date : 2015-06-16 DOI:10.1145/2756594.2756596

Kwangsung Oh, A. Raghavan, A. Chandra, J. Weissman

{"title":"Redefining Data Locality for Cross-Data Center Storage","authors":"Kwangsung Oh, A. Raghavan, A. Chandra, J. Weissman","doi":"10.1145/2756594.2756596","DOIUrl":null,"url":null,"abstract":"Many Cloud applications exploit the diversity of storage options in a data center to achieve desired cost, performance, and durability tradeoffs. It is common to see applications using a combination of memory, local disk, and archival storage tiers within a single data center to meet their needs. For example, hot data can be kept in memory using ElastiCache, and colder data in cheaper, slower storage such as S3, using Amazon as an example. For user-facing applications, a recent trend is to exploit multiple data centers for data placement to enable better latency of access from users to their data. The conventional wisdom is that co-location of computation and storage within the same data center is a key to application performance, so that applications running within a data center are often still limited to access local data. In this paper, using experiments on Amazon, Microsoft, and Google clouds, we show that this assumption is false, and that accessing data in nearby data centers may be faster than local access at different or even same points in the storage hierarchy. This can lead to not only better performance, but also reduced cost, simpler consistency policies and reconsidering data locality in multiple DCs environment. This argues for an expansion of cloud storage tiers to consider non-local storage options, and has interesting implications for the design of a distributed storage system.","PeriodicalId":283088,"journal":{"name":"Proceedings of the 2nd International Workshop on Software-Defined Ecosystems","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Workshop on Software-Defined Ecosystems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2756594.2756596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Many Cloud applications exploit the diversity of storage options in a data center to achieve desired cost, performance, and durability tradeoffs. It is common to see applications using a combination of memory, local disk, and archival storage tiers within a single data center to meet their needs. For example, hot data can be kept in memory using ElastiCache, and colder data in cheaper, slower storage such as S3, using Amazon as an example. For user-facing applications, a recent trend is to exploit multiple data centers for data placement to enable better latency of access from users to their data. The conventional wisdom is that co-location of computation and storage within the same data center is a key to application performance, so that applications running within a data center are often still limited to access local data. In this paper, using experiments on Amazon, Microsoft, and Google clouds, we show that this assumption is false, and that accessing data in nearby data centers may be faster than local access at different or even same points in the storage hierarchy. This can lead to not only better performance, but also reduced cost, simpler consistency policies and reconsidering data locality in multiple DCs environment. This argues for an expansion of cloud storage tiers to consider non-local storage options, and has interesting implications for the design of a distributed storage system.

查看原文本刊更多论文

重新定义跨数据中心存储的数据位置

许多云应用程序利用数据中心中存储选项的多样性来实现所需的成本、性能和持久性权衡。应用程序在单个数据中心内使用内存、本地磁盘和归档存储层的组合来满足其需求是很常见的。例如，热数据可以使用ElastiCache保存在内存中，冷数据可以保存在更便宜、更慢的存储(如S3)中，以Amazon为例。对于面向用户的应用程序，最近的一个趋势是利用多个数据中心进行数据放置，以提高用户对其数据的访问延迟。传统观点认为，计算和存储在同一数据中心内的共存位置是提高应用程序性能的关键，因此在数据中心内运行的应用程序通常仍然仅限于访问本地数据。在本文中，通过对Amazon、Microsoft和Google云的实验，我们证明了这个假设是错误的，并且访问附近数据中心的数据可能比访问存储层次结构中不同甚至相同点的本地数据更快。这不仅可以提高性能，还可以降低成本，简化一致性策略，并重新考虑多个数据中心环境中的数据位置。这就要求扩展云存储层以考虑非本地存储选项，并对分布式存储系统的设计产生有趣的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2nd International Workshop on Software-Defined Ecosystems

自引率

0.00%

发文量