File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution

Abutalib Aghayev, S. Weil, Michael Kuchnik, M. Nelson, G. Ganger, George Amvrosiadis
{"title":"File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution","authors":"Abutalib Aghayev, S. Weil, Michael Kuchnik, M. Nelson, G. Ganger, George Amvrosiadis","doi":"10.1145/3341301.3359656","DOIUrl":null,"url":null,"abstract":"For a decade, the Ceph distributed file system followed the conventional wisdom of building its storage backend on top of local file systems. This is a preferred choice for most distributed file systems today because it allows them to benefit from the convenience and maturity of battle-tested code. Ceph's experience, however, shows that this comes at a high price. First, developing a zero-overhead transaction mechanism is challenging. Second, metadata performance at the local level can significantly affect performance at the distributed level. Third, supporting emerging storage hardware is painstakingly slow. Ceph addressed these issues with BlueStore, a new back-end designed to run directly on raw storage devices. In only two years since its inception, BlueStore outperformed previous established backends and is adopted by 70% of users in production. By running in user space and fully controlling the I/O stack, it has enabled space-efficient metadata and data checksums, fast overwrites of erasure-coded data, inline compression, decreased performance variability, and avoided a series of performance pitfalls of local file systems. Finally, it makes the adoption of backwards-incompatible storage hardware possible, an important trait in a changing storage landscape that is learning to embrace hardware diversity.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3341301.3359656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 66

Abstract

For a decade, the Ceph distributed file system followed the conventional wisdom of building its storage backend on top of local file systems. This is a preferred choice for most distributed file systems today because it allows them to benefit from the convenience and maturity of battle-tested code. Ceph's experience, however, shows that this comes at a high price. First, developing a zero-overhead transaction mechanism is challenging. Second, metadata performance at the local level can significantly affect performance at the distributed level. Third, supporting emerging storage hardware is painstakingly slow. Ceph addressed these issues with BlueStore, a new back-end designed to run directly on raw storage devices. In only two years since its inception, BlueStore outperformed previous established backends and is adopted by 70% of users in production. By running in user space and fully controlling the I/O stack, it has enabled space-efficient metadata and data checksums, fast overwrites of erasure-coded data, inline compression, decreased performance variability, and avoided a series of performance pitfalls of local file systems. Finally, it makes the adoption of backwards-incompatible storage hardware possible, an important trait in a changing storage landscape that is learning to embrace hardware diversity.
文件系统不适合作为分布式存储后端:来自Ceph 10年发展的教训
十年来,Ceph分布式文件系统一直遵循在本地文件系统之上构建其存储后端的传统智慧。这是当今大多数分布式文件系统的首选,因为它使它们能够受益于经过实战测试的代码的便利性和成熟度。然而,Ceph的经验表明,这需要付出高昂的代价。首先,开发零开销事务机制具有挑战性。其次,本地级别的元数据性能会显著影响分布式级别的性能。第三,支持新兴的存储硬件非常缓慢。Ceph通过BlueStore解决了这些问题,BlueStore是一个设计用于直接在原始存储设备上运行的新后端。BlueStore在成立后的短短两年内,就超越了之前建立的后端,并在生产中被70%的用户采用。通过在用户空间中运行并完全控制I/O堆栈,它支持节省空间的元数据和数据校验和、对擦除编码数据的快速覆盖、内联压缩、降低性能可变性,并避免了本地文件系统的一系列性能缺陷。最后,它使采用向后不兼容的存储硬件成为可能,这在不断变化的存储环境中是一个重要的特征,因为存储环境正在学习接受硬件多样性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信