Automatic Stream Identification to Improve Flash Endurance in Data Centers

J. Bhimani, Zhengyu Yang, Jingpei Yang, Adnan Maruf, N. Mi, R. Pandurangan, Changho Choi, V. Balakrishnan
{"title":"Automatic Stream Identification to Improve Flash Endurance in Data Centers","authors":"J. Bhimani, Zhengyu Yang, Jingpei Yang, Adnan Maruf, N. Mi, R. Pandurangan, Changho Choi, V. Balakrishnan","doi":"10.1145/3470007","DOIUrl":null,"url":null,"abstract":"The demand for high performance I/O in Storage-as-a-Service (SaaS) is increasing day by day. To address this demand, NAND Flash-based Solid-state Drives (SSDs) are commonly used in data centers as cache- or top-tiers in the storage rack ascribe to their superior performance compared to traditional hard disk drives (HDDs). Meanwhile, with the capital expenditure of SSDs declining and the storage capacity of SSDs increasing, all-flash data centers are evolving to serve cloud services better than SSD-HDD hybrid data centers. During this transition, the biggest challenge is how to reduce the Write Amplification Factor (WAF) as well as to improve the endurance of SSD since this device has a limited program/erase cycles. A specified case is that storing data with different lifetimes (i.e., I/O streams with similar temporal fetching patterns such as reaccess frequency) in one single SSD can cause high WAF, reduce the endurance, and downgrade the performance of SSDs. Motivated by this, multi-stream SSDs have been developed to enable data with a different lifetime to be stored in different SSD regions. The logic behind this is to reduce the internal movement of data—when garbage collection is triggered, there are high chances of having data blocks with either all the pages being invalid or valid. However, the limitation of this technology is that the system needs to manually assign the same streamID to data with a similar lifetime. Unfortunately, when data arrives, it is not known how important this data is and how long this data will stay unmodified. Moreover, according to our observation, with different definitions of a lifetime (i.e., different calculation formulas based on selected features previously exhibited by data, such as sequentiality, and frequency), streamID identification may have varying impacts on the final WAF of multi-stream SSDs. Thus, in this article, we first develop a portable and adaptable framework to study the impacts of different workload features and their combinations on write amplification. We then propose a feature-based stream identification approach, which automatically co-relates the measurable workload attributes (such as I/O size, I/O rate, and so on.) with high-level workload features (such as frequency, sequentiality, and so on.) and determines a right combination of workload features for assigning streamIDs. Finally, we develop an adaptable stream assignment technique to assign streamID for changing workloads dynamically. Our evaluation results show that our automation approach of stream detection and separation can effectively reduce the WAF by using appropriate features for stream assignment with minimal implementation overhead.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"12 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage (TOS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3470007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The demand for high performance I/O in Storage-as-a-Service (SaaS) is increasing day by day. To address this demand, NAND Flash-based Solid-state Drives (SSDs) are commonly used in data centers as cache- or top-tiers in the storage rack ascribe to their superior performance compared to traditional hard disk drives (HDDs). Meanwhile, with the capital expenditure of SSDs declining and the storage capacity of SSDs increasing, all-flash data centers are evolving to serve cloud services better than SSD-HDD hybrid data centers. During this transition, the biggest challenge is how to reduce the Write Amplification Factor (WAF) as well as to improve the endurance of SSD since this device has a limited program/erase cycles. A specified case is that storing data with different lifetimes (i.e., I/O streams with similar temporal fetching patterns such as reaccess frequency) in one single SSD can cause high WAF, reduce the endurance, and downgrade the performance of SSDs. Motivated by this, multi-stream SSDs have been developed to enable data with a different lifetime to be stored in different SSD regions. The logic behind this is to reduce the internal movement of data—when garbage collection is triggered, there are high chances of having data blocks with either all the pages being invalid or valid. However, the limitation of this technology is that the system needs to manually assign the same streamID to data with a similar lifetime. Unfortunately, when data arrives, it is not known how important this data is and how long this data will stay unmodified. Moreover, according to our observation, with different definitions of a lifetime (i.e., different calculation formulas based on selected features previously exhibited by data, such as sequentiality, and frequency), streamID identification may have varying impacts on the final WAF of multi-stream SSDs. Thus, in this article, we first develop a portable and adaptable framework to study the impacts of different workload features and their combinations on write amplification. We then propose a feature-based stream identification approach, which automatically co-relates the measurable workload attributes (such as I/O size, I/O rate, and so on.) with high-level workload features (such as frequency, sequentiality, and so on.) and determines a right combination of workload features for assigning streamIDs. Finally, we develop an adaptable stream assignment technique to assign streamID for changing workloads dynamically. Our evaluation results show that our automation approach of stream detection and separation can effectively reduce the WAF by using appropriate features for stream assignment with minimal implementation overhead.
自动流识别提高数据中心闪存寿命
存储即服务(SaaS)对高性能I/O的需求日益增长。为了满足这一需求,基于NAND闪存的固态硬盘(ssd)通常被用作数据中心的缓存或存储机架中的顶层,因为与传统硬盘驱动器(hdd)相比,它们具有卓越的性能。同时,随着ssd资本支出的下降和ssd存储容量的增加,全闪存数据中心正在向比SSD-HDD混合数据中心更好地服务云服务的方向发展。在这一转变过程中,最大的挑战是如何降低写入放大系数(WAF)以及提高SSD的耐用性,因为该设备具有有限的程序/擦除周期。一个特定的情况是,在一个SSD中存储具有不同生命周期的数据(即具有类似的临时获取模式(如重访问频率)的I/O流)可能会导致高WAF,降低持久时间,并降低SSD的性能。受此启发,多流SSD被开发出来,使不同生命周期的数据能够存储在不同的SSD区域中。这背后的逻辑是减少数据的内部移动——当触发垃圾收集时,很有可能出现所有页面都无效或有效的数据块。然而,这种技术的局限性是系统需要手动为具有相似生命周期的数据分配相同的流。不幸的是,当数据到达时,不知道这些数据有多重要,也不知道这些数据将保持多长时间不被修改。此外,根据我们的观察,使用不同的寿命定义(即基于数据先前显示的选择特征(如顺序和频率)的不同计算公式),流识别可能会对多流ssd的最终WAF产生不同的影响。因此,在本文中,我们首先开发了一个可移植且适应性强的框架,以研究不同工作负载特征及其组合对写放大的影响。然后,我们提出了一种基于特征的流识别方法,该方法自动将可测量的工作负载属性(如I/O大小、I/O速率等)与高级工作负载特征(如频率、顺序性等)关联起来,并确定分配流id的工作负载特征的正确组合。最后,我们开发了一种自适应流分配技术,可以动态地为不断变化的工作负载分配流。我们的评估结果表明,我们的流检测和分离自动化方法可以通过使用适当的流分配特征来有效地减少WAF,并且实现开销最小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信