Characterizing the performance of intel optane persistent memory: a close look at its on-DIMM buffering

Proceedings of the Seventeenth European Conference on Computer Systems Pub Date : 2022-03-28 DOI:10.1145/3492321.3519556

Lingfeng Xiang, Xingsheng Zhao, J. Rao, Song Jiang, Hong Jiang

{"title":"Characterizing the performance of intel optane persistent memory: a close look at its on-DIMM buffering","authors":"Lingfeng Xiang, Xingsheng Zhao, J. Rao, Song Jiang, Hong Jiang","doi":"10.1145/3492321.3519556","DOIUrl":null,"url":null,"abstract":"We present a comprehensive and in-depth study of Intel Optane DC persistent memory (DCPMM). Our focus is on exploring the internal design of Optane's on-DIMM read-write buffering and its impacts on application-perceived performance, read and write amplifications, the overhead of different types of persists, and the tradeoffs between persistency models. While our measurements confirm the results of the existing profiling studies, we have new discoveries and offer new insights. Notably, we find that read and write are managed differently in separate on-DIMM read and write buffers. Comparable in size, the two buffers serve distinct purposes. The read buffer offers higher concurrency and effective on-DIMM prefetching, leading to high read bandwidth and superior sequential performance. However, it does not help hide media access latency. In contrast, the write buffer offers limited concurrency but is a critical stage in a pipeline that supports asynchronous write in the DDR-T protocol. Surprisingly, in addition to write coalescing, the write buffer delivers lower than read and consistent write latency regardless of the working set size, the type of write, the access pattern, or the persistency model. Furthermore, we discover that the mismatch between cacheline access granularity and the 3D-Xpoint media access granularity negatively impacts the effectiveness of CPU cache prefetching and leads to wasted persistent memory bandwidth. Our proposition is to decouple read and write in the performance analysis and optimization of persistent programs. We present three case studies based on this insight and demonstrate considerable performance improvements. We verify the results on two generations of Optane DCPMM.","PeriodicalId":196414,"journal":{"name":"Proceedings of the Seventeenth European Conference on Computer Systems","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Seventeenth European Conference on Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3492321.3519556","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

Abstract

We present a comprehensive and in-depth study of Intel Optane DC persistent memory (DCPMM). Our focus is on exploring the internal design of Optane's on-DIMM read-write buffering and its impacts on application-perceived performance, read and write amplifications, the overhead of different types of persists, and the tradeoffs between persistency models. While our measurements confirm the results of the existing profiling studies, we have new discoveries and offer new insights. Notably, we find that read and write are managed differently in separate on-DIMM read and write buffers. Comparable in size, the two buffers serve distinct purposes. The read buffer offers higher concurrency and effective on-DIMM prefetching, leading to high read bandwidth and superior sequential performance. However, it does not help hide media access latency. In contrast, the write buffer offers limited concurrency but is a critical stage in a pipeline that supports asynchronous write in the DDR-T protocol. Surprisingly, in addition to write coalescing, the write buffer delivers lower than read and consistent write latency regardless of the working set size, the type of write, the access pattern, or the persistency model. Furthermore, we discover that the mismatch between cacheline access granularity and the 3D-Xpoint media access granularity negatively impacts the effectiveness of CPU cache prefetching and leads to wasted persistent memory bandwidth. Our proposition is to decouple read and write in the performance analysis and optimization of persistent programs. We present three case studies based on this insight and demonstrate considerable performance improvements. We verify the results on two generations of Optane DCPMM.

查看原文本刊更多论文

表征intel optane持久存储器的性能:仔细观察其在dimm上的缓冲

我们对英特尔Optane DC持久存储器(DCPMM)进行了全面而深入的研究。我们的重点是探索Optane的on- dimm读写缓冲的内部设计及其对应用程序感知性能、读写放大、不同类型持久化的开销以及持久化模型之间的权衡的影响。虽然我们的测量结果证实了现有分析研究的结果，但我们有了新的发现，并提供了新的见解。值得注意的是，我们发现读和写在单独的dimm读和写缓冲区中被不同地管理。这两个缓冲区的大小相当，但用途不同。读缓冲区提供更高的并发性和有效的内存预取，从而实现高读带宽和优越的顺序性能。但是，它不能帮助隐藏媒体访问延迟。相反，写缓冲区提供有限的并发性，但它是DDR-T协议中支持异步写的管道中的关键阶段。令人惊讶的是，除了写合并之外，无论工作集大小、写类型、访问模式或持久性模型如何，写缓冲区提供的写延迟都低于读和一致的写延迟。此外，我们发现缓存访问粒度与3D-Xpoint媒体访问粒度之间的不匹配会对CPU缓存预取的有效性产生负面影响，并导致浪费持久内存带宽。我们的主张是在持久化程序的性能分析和优化中解耦读和写。我们基于这一见解提出了三个案例研究，并展示了相当大的性能改进。我们在两代Optane DCPMM上验证了结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Seventeenth European Conference on Computer Systems

自引率

0.00%

发文量