Xianyu He , Chaoshu Yang , Runyu Zhang , Huizhang Luo , Zhichao Cao , Jeff Zhang
{"title":"Optimizing both performance and tail latency for B+tree on persistent memory","authors":"Xianyu He , Chaoshu Yang , Runyu Zhang , Huizhang Luo , Zhichao Cao , Jeff Zhang","doi":"10.1016/j.sysarc.2025.103406","DOIUrl":null,"url":null,"abstract":"<div><div>B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-trees are widely used in databases and they have been optimized for persistent memory (PM) in recent studies. However, existing PM-oriented B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree designs are facing write performance penalties, high tail latency, and scalability issues, which are caused by three critical design limitations and the issues can be amplified on PM due to asymmetric write and read performance of PM: <strong>(1)</strong> node splits can lead to massive data migration; <strong>(2)</strong> frequent node splits can lead to high overhead of cascading modification; <strong>(3)</strong> node revision can lead to inefficient parallelism. In this paper, we propose a novel B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree-based index for PM with <strong>H</strong>igh write performance and <strong>L</strong>ow tail latency, called <strong>HLTree</strong>, to solve the aforementioned issues and optimize both performance and tail latency for PM-oriented B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>tree. First, HLTree employs a new node pre-split strategy to reduce the write overhead of legacy B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree designs. Second, HLTree decouples the structural modification operations from the critical path of the B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree and completes it asynchronously to reduce the overhead of cascading modification. Finally, HLTree optimizes optimistic version locks to reduce conflicts among readers and writers for lower latency and better scalability. Based on the evaluations conducted on Intel Optane DCPMM, compared with <span><math><mi>μ</mi></math></span>Tree/SSB-Tree/Fast&Fair/FPTree, HLTree provides 1.06<span><math><mo>×</mo></math></span>/2.38<span><math><mo>×</mo></math></span>/2.16<span><math><mo>×</mo></math></span>/1.55<span><math><mo>×</mo></math></span> read throughput and 1.50<span><math><mo>×</mo></math></span>/2.28<span><math><mo>×</mo></math></span>/2.13<span><math><mo>×</mo></math></span>/1.58<span><math><mo>×</mo></math></span> write throughput on average, respectively. Moreover, HLTree reduces up to one order of magnitude lower of the 99.9th percentile tail latency.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"163 ","pages":"Article 103406"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125000785","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
B-trees are widely used in databases and they have been optimized for persistent memory (PM) in recent studies. However, existing PM-oriented B-tree designs are facing write performance penalties, high tail latency, and scalability issues, which are caused by three critical design limitations and the issues can be amplified on PM due to asymmetric write and read performance of PM: (1) node splits can lead to massive data migration; (2) frequent node splits can lead to high overhead of cascading modification; (3) node revision can lead to inefficient parallelism. In this paper, we propose a novel B-tree-based index for PM with High write performance and Low tail latency, called HLTree, to solve the aforementioned issues and optimize both performance and tail latency for PM-oriented Btree. First, HLTree employs a new node pre-split strategy to reduce the write overhead of legacy B-tree designs. Second, HLTree decouples the structural modification operations from the critical path of the B-tree and completes it asynchronously to reduce the overhead of cascading modification. Finally, HLTree optimizes optimistic version locks to reduce conflicts among readers and writers for lower latency and better scalability. Based on the evaluations conducted on Intel Optane DCPMM, compared with Tree/SSB-Tree/Fast&Fair/FPTree, HLTree provides 1.06/2.38/2.16/1.55 read throughput and 1.50/2.28/2.13/1.58 write throughput on average, respectively. Moreover, HLTree reduces up to one order of magnitude lower of the 99.9th percentile tail latency.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.