Patrick Raaf, André Brinkmann, E. Borba, Hossen Asadi, Sai Narasimhamurthy, John Bent, Mohamad El-Batal, Reza Salkhordeh
{"title":"从 SSD 回到 HDD:优化 VDO 以支持作为主存储介质的 HDD 的内联重复数据删除和压缩功能","authors":"Patrick Raaf, André Brinkmann, E. Borba, Hossen Asadi, Sai Narasimhamurthy, John Bent, Mohamad El-Batal, Reza Salkhordeh","doi":"10.1145/3678250","DOIUrl":null,"url":null,"abstract":"Deduplication and compression are powerful techniques to reduce the ratio between the quantity of logical data stored and the physical amount of consumed storage. Deduplication can impose significant performance overheads, as duplicate detection for large systems induces random accesses to the backend storage. These random accesses have led to the concern that deduplication for primary storage and HDDs are not compatible. Most inline data reduction solutions are therefore optimized for SSDs and discourage their use for HDDs, even for sequential workloads.\n \n In this work, we show that these concerns are valid if and only if the lessons learned from deduplication research are not applied. We have therefore investigated data reduction solutions for primary storage based on the RedHat\n Virtual Disk Optimizer\n (VDO) and show that directly applying them can decrease sequential write performance for HDDs by 36-times. We then show that slight modifications to VDO plus the integration of a very small SSD area significantly improve performance even beyond the performance without data reduction enabled, making HDDs more cost-efficient for a wide range of mostly sequential Cloud workloads than SSDs. Additionally, these VDO optimizations do not require to maintain different code bases for HDDs and SSDs and we therefore provide the first data reduction solution applicable to both storage media.\n","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From SSDs Back to HDDs: Optimizing VDO to Support Inline Deduplication and Compression for HDDs as Primary Storage Media\",\"authors\":\"Patrick Raaf, André Brinkmann, E. Borba, Hossen Asadi, Sai Narasimhamurthy, John Bent, Mohamad El-Batal, Reza Salkhordeh\",\"doi\":\"10.1145/3678250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deduplication and compression are powerful techniques to reduce the ratio between the quantity of logical data stored and the physical amount of consumed storage. Deduplication can impose significant performance overheads, as duplicate detection for large systems induces random accesses to the backend storage. These random accesses have led to the concern that deduplication for primary storage and HDDs are not compatible. Most inline data reduction solutions are therefore optimized for SSDs and discourage their use for HDDs, even for sequential workloads.\\n \\n In this work, we show that these concerns are valid if and only if the lessons learned from deduplication research are not applied. We have therefore investigated data reduction solutions for primary storage based on the RedHat\\n Virtual Disk Optimizer\\n (VDO) and show that directly applying them can decrease sequential write performance for HDDs by 36-times. We then show that slight modifications to VDO plus the integration of a very small SSD area significantly improve performance even beyond the performance without data reduction enabled, making HDDs more cost-efficient for a wide range of mostly sequential Cloud workloads than SSDs. Additionally, these VDO optimizations do not require to maintain different code bases for HDDs and SSDs and we therefore provide the first data reduction solution applicable to both storage media.\\n\",\"PeriodicalId\":49113,\"journal\":{\"name\":\"ACM Transactions on Storage\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Storage\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3678250\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3678250","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
From SSDs Back to HDDs: Optimizing VDO to Support Inline Deduplication and Compression for HDDs as Primary Storage Media
Deduplication and compression are powerful techniques to reduce the ratio between the quantity of logical data stored and the physical amount of consumed storage. Deduplication can impose significant performance overheads, as duplicate detection for large systems induces random accesses to the backend storage. These random accesses have led to the concern that deduplication for primary storage and HDDs are not compatible. Most inline data reduction solutions are therefore optimized for SSDs and discourage their use for HDDs, even for sequential workloads.
In this work, we show that these concerns are valid if and only if the lessons learned from deduplication research are not applied. We have therefore investigated data reduction solutions for primary storage based on the RedHat
Virtual Disk Optimizer
(VDO) and show that directly applying them can decrease sequential write performance for HDDs by 36-times. We then show that slight modifications to VDO plus the integration of a very small SSD area significantly improve performance even beyond the performance without data reduction enabled, making HDDs more cost-efficient for a wide range of mostly sequential Cloud workloads than SSDs. Additionally, these VDO optimizations do not require to maintain different code bases for HDDs and SSDs and we therefore provide the first data reduction solution applicable to both storage media.
期刊介绍:
The ACM Transactions on Storage (TOS) is a new journal with an intent to publish original archival papers in the area of storage and closely related disciplines. Articles that appear in TOS will tend either to present new techniques and concepts or to report novel experiences and experiments with practical systems. Storage is a broad and multidisciplinary area that comprises of network protocols, resource management, data backup, replication, recovery, devices, security, and theory of data coding, densities, and low-power. Potential synergies among these fields are expected to open up new research directions.