SES-Dedup: a Case for Low-Cost ECC-based SSD Deduplication

2019 35th Symposium on Mass Storage Systems and Technologies (MSST) Pub Date : 2019-05-01 DOI:10.1109/MSST.2019.00009

Zhichao Yan, Hong Jiang, Song Jiang, Yujuan Tan, Hao Luo

{"title":"SES-Dedup: a Case for Low-Cost ECC-based SSD Deduplication","authors":"Zhichao Yan, Hong Jiang, Song Jiang, Yujuan Tan, Hao Luo","doi":"10.1109/MSST.2019.00009","DOIUrl":null,"url":null,"abstract":"Integrating the data deduplication function into Solid State Drives (SSDs) helps avoid writing duplicate contents to NAND flash chips, which will not only effectively reduce the number of Program/Erase (P/E) operations to extend the device's lifespan but also proportionally enlarge the effective capacity of SSD to improve the performance of its behind-the-scenes maintenance tasks such as wear-leveling (WL) and garbage-collection (GC). However, these benefits of deduplication come at a non-trivial computational cost incurred by the embedded SSD controller to compute cryptographic hashes. To address this overhead problem, some researchers have suggested replacing cryptographic hashes with error correction codes (ECCs) already embedded in the SSD chips to detect the duplicate contents. However, all existing attempts have ignored the impact of the data randomization (scrambler) module that is widely used in modern SSDs, thus making it impractical to directly integrate ECC-based deduplication into commercial SSDs. In this work, we revisit SSD's internal structure and propose the first deduplicatable SSD that can bypass the data scrambler module to enable the low-cost ECC-based data deduplication. Specifically, we propose two design solutions, one on the host side and the other on the device side, to enable ECC-based deduplication. Based on our approach, we can effectively exploit SSD's built-in ECC module to calculate the hash values of stored data for data deduplication. We have evaluated our SES-Dedup approach by replaying data traces in an SSD simulator and found that it can remove up to 30.8% redundant data with up to 17.0% write performance improvement over the baseline SSD.","PeriodicalId":391517,"journal":{"name":"2019 35th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 35th Symposium on Mass Storage Systems and Technologies (MSST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSST.2019.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Integrating the data deduplication function into Solid State Drives (SSDs) helps avoid writing duplicate contents to NAND flash chips, which will not only effectively reduce the number of Program/Erase (P/E) operations to extend the device's lifespan but also proportionally enlarge the effective capacity of SSD to improve the performance of its behind-the-scenes maintenance tasks such as wear-leveling (WL) and garbage-collection (GC). However, these benefits of deduplication come at a non-trivial computational cost incurred by the embedded SSD controller to compute cryptographic hashes. To address this overhead problem, some researchers have suggested replacing cryptographic hashes with error correction codes (ECCs) already embedded in the SSD chips to detect the duplicate contents. However, all existing attempts have ignored the impact of the data randomization (scrambler) module that is widely used in modern SSDs, thus making it impractical to directly integrate ECC-based deduplication into commercial SSDs. In this work, we revisit SSD's internal structure and propose the first deduplicatable SSD that can bypass the data scrambler module to enable the low-cost ECC-based data deduplication. Specifically, we propose two design solutions, one on the host side and the other on the device side, to enable ECC-based deduplication. Based on our approach, we can effectively exploit SSD's built-in ECC module to calculate the hash values of stored data for data deduplication. We have evaluated our SES-Dedup approach by replaying data traces in an SSD simulator and found that it can remove up to 30.8% redundant data with up to 17.0% write performance improvement over the baseline SSD.

查看原文本刊更多论文

SES-Dedup:基于ecc的SSD低成本重复数据删除案例

将重复数据删除功能集成到SSD (Solid State Drives)中，可以避免重复的内容写入NAND闪存芯片，不仅可以有效减少P/E (Program/Erase)操作，延长设备的使用寿命，还可以成比例地扩大SSD的有效容量，提高其后台维护任务(如磨损均衡(WL)和垃圾收集(GC))的性能。然而，重复数据删除的这些好处是以嵌入式SSD控制器计算加密哈希所带来的计算成本为代价的。为了解决这个开销问题，一些研究人员建议用已经嵌入在SSD芯片中的纠错码(ECCs)替换加密散列，以检测重复内容。然而，现有的所有尝试都忽略了现代ssd中广泛使用的数据随机化(扰频器)模块的影响，因此直接将基于ecc的重复数据删除集成到商用ssd中是不切实际的。在这项工作中，我们重新审视了SSD的内部结构，并提出了第一种可重复数据删除的SSD，它可以绕过数据扰频模块，实现基于ecc的低成本重复数据删除。具体来说，我们提出了两种设计方案，一种在主机端，另一种在设备端，以启用基于ecc的重复数据删除。基于我们的方法，我们可以有效地利用SSD内置的ECC模块来计算存储数据的哈希值，用于重复数据删除。我们通过在SSD模拟器中重播数据跟踪来评估我们的SES-Dedup方法，发现它可以删除多达30.8%的冗余数据，与基准SSD相比，写入性能提高高达17.0%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 35th Symposium on Mass Storage Systems and Technologies (MSST)

自引率

0.00%

发文量