{"title":"中间值线性化:一个定量正确性准则","authors":"Arik Rinberg, Idit Keidar","doi":"https://dl.acm.org/doi/10.1145/3584699","DOIUrl":null,"url":null,"abstract":"<p>Big data processing systems often employ batched updates and data sketches to estimate certain properties of large data. For example, a <i>CountMin sketch</i> approximates the frequencies at which elements occur in a data stream, and a <i>batched counter</i> counts events in batches. This article focuses on correctness criteria for concurrent implementations of such objects. Specifically, we consider <i>quantitative</i> objects, whose return values are from an ordered domain, with a particular emphasis on <i>(ε,δ)-bounded</i> objects that estimate a numerical quantity with an error of at most ε with probability at least 1 - δ.</p><p>The de facto correctness criterion for concurrent objects is linearizability. Intuitively, under linearizability, when a read overlaps an update, it must return the object’s value either before the update or after it. Consider, for example, a single batched increment operation that counts three new events, bumping a batched counter’s value from 7 to 10. In a linearizable implementation of the counter, a read overlapping this update must return either 7 or 10. We observe, however, that in typical use cases, any <i>intermediate</i> value between 7 and 10 would also be acceptable. To capture this additional degree of freedom, we propose <i>Intermediate Value Linearizability (IVL)</i>, a new correctness criterion that relaxes linearizability to allow returning intermediate values, for instance, 8 in the example above. Roughly speaking, IVL allows reads to return any value that is bounded between two return values that are legal under linearizability.</p><p>A key feature of IVL is that we can prove that concurrent IVL implementations of (ε,δ)-bounded objects are themselves (ε,δ)-bounded. To illustrate the power of this result, we give a straightforward and efficient concurrent implementation of an (ε,δ)-bounded CountMin sketch, which is IVL (albeit not linearizable).</p><p>We present four examples for IVL objects, each showcasing a different way of using IVL. The first is a simple wait-free IVL batched counter, with <i>O</i>(1) step complexity for update. The next considers an (ε,δ)-bounded CountMin sketch and further shows how to relax IVL using the notion of <i>r</i>-relaxation. Our third example is a non-atomic iterator over a data structure. In this example, we augment the data structure with an <i>auxiliary history variable</i> state that includes “tombstones” for items deleted from the data structure. Here, IVL semantics are required at the augmented level. Finally, using a <i>priority queue</i>, we show that some objects require IVL to be paired with other correctness criteria; indeed, a natural correctness notion for a concurrent priority queue is IVL coupled with sequential consistency.</p><p>Last, we show that IVL allows for inherently cheaper implementations than linearizable ones. In particular, we show a lower bound of Ω (<i>n</i>) on the step complexity of the update operation of any wait-free linearizable batched counter from single-writer multi-reader registers, which is more expensive than our <i>O</i>(1) IVL implementation.</p>","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"43 11-12","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intermediate Value Linearizability: A Quantitative Correctness Criterion\",\"authors\":\"Arik Rinberg, Idit Keidar\",\"doi\":\"https://dl.acm.org/doi/10.1145/3584699\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Big data processing systems often employ batched updates and data sketches to estimate certain properties of large data. For example, a <i>CountMin sketch</i> approximates the frequencies at which elements occur in a data stream, and a <i>batched counter</i> counts events in batches. This article focuses on correctness criteria for concurrent implementations of such objects. Specifically, we consider <i>quantitative</i> objects, whose return values are from an ordered domain, with a particular emphasis on <i>(ε,δ)-bounded</i> objects that estimate a numerical quantity with an error of at most ε with probability at least 1 - δ.</p><p>The de facto correctness criterion for concurrent objects is linearizability. Intuitively, under linearizability, when a read overlaps an update, it must return the object’s value either before the update or after it. Consider, for example, a single batched increment operation that counts three new events, bumping a batched counter’s value from 7 to 10. In a linearizable implementation of the counter, a read overlapping this update must return either 7 or 10. We observe, however, that in typical use cases, any <i>intermediate</i> value between 7 and 10 would also be acceptable. To capture this additional degree of freedom, we propose <i>Intermediate Value Linearizability (IVL)</i>, a new correctness criterion that relaxes linearizability to allow returning intermediate values, for instance, 8 in the example above. Roughly speaking, IVL allows reads to return any value that is bounded between two return values that are legal under linearizability.</p><p>A key feature of IVL is that we can prove that concurrent IVL implementations of (ε,δ)-bounded objects are themselves (ε,δ)-bounded. To illustrate the power of this result, we give a straightforward and efficient concurrent implementation of an (ε,δ)-bounded CountMin sketch, which is IVL (albeit not linearizable).</p><p>We present four examples for IVL objects, each showcasing a different way of using IVL. The first is a simple wait-free IVL batched counter, with <i>O</i>(1) step complexity for update. The next considers an (ε,δ)-bounded CountMin sketch and further shows how to relax IVL using the notion of <i>r</i>-relaxation. Our third example is a non-atomic iterator over a data structure. In this example, we augment the data structure with an <i>auxiliary history variable</i> state that includes “tombstones” for items deleted from the data structure. Here, IVL semantics are required at the augmented level. Finally, using a <i>priority queue</i>, we show that some objects require IVL to be paired with other correctness criteria; indeed, a natural correctness notion for a concurrent priority queue is IVL coupled with sequential consistency.</p><p>Last, we show that IVL allows for inherently cheaper implementations than linearizable ones. In particular, we show a lower bound of Ω (<i>n</i>) on the step complexity of the update operation of any wait-free linearizable batched counter from single-writer multi-reader registers, which is more expensive than our <i>O</i>(1) IVL implementation.</p>\",\"PeriodicalId\":50022,\"journal\":{\"name\":\"Journal of the ACM\",\"volume\":\"43 11-12\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2023-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the ACM\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/https://dl.acm.org/doi/10.1145/3584699\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the ACM","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3584699","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
摘要
大数据处理系统经常使用批量更新和数据草图来估计大数据的某些属性。例如,CountMin草图近似于元素在数据流中出现的频率,批处理计数器分批计数事件。本文主要关注这类对象的并发实现的正确性标准。具体地说,我们考虑定量对象,其返回值来自有序域,特别强调(ε,δ)有界对象,其估计数值数量的误差最多为ε,概率至少为1 - δ。并发对象的事实上的正确性标准是线性化。直观地说,在线性化条件下,当读取操作与更新操作重叠时,它必须返回对象的值,要么在更新操作之前,要么在更新操作之后。例如,考虑单个批处理增量操作,该操作计数三个新事件,将批处理计数器的值从7增加到10。在可线性化的计数器实现中,与此更新重叠的读操作必须返回7或10。然而,我们观察到,在典型的用例中,7到10之间的任何中间值也是可以接受的。为了获得这个额外的自由度,我们提出了中间值线性化(Intermediate Value Linearizability, IVL),这是一个新的正确性标准,它放宽了线性化,允许返回中间值,例如上面的例子中的8。粗略地说,IVL允许读取返回在线性化条件下合法的两个返回值之间的任何值。IVL的一个关键特征是我们可以证明(ε,δ)有界对象的并发IVL实现本身是(ε,δ)有界的。为了说明这个结果的力量,我们给出了一个(ε,δ)有界CountMin草图的简单有效的并发实现,它是IVL(尽管不是线性化的)。我们给出了IVL对象的四个示例,每个示例都展示了使用IVL的不同方式。第一个是简单的无等待IVL批处理计数器,更新的步骤复杂度为0(1)步。接下来考虑一个(ε,δ)有界的CountMin草图,并进一步展示如何使用r-松弛的概念来松弛IVL。第三个例子是数据结构上的非原子迭代器。在这个例子中,我们用一个辅助的历史变量状态来扩展数据结构,其中包括从数据结构中删除的项的“墓碑”。这里,在增强级别需要IVL语义。最后,使用优先级队列,我们展示了一些对象需要IVL与其他正确性标准配对;实际上,并发优先级队列的自然正确性概念是IVL与顺序一致性相结合。最后,我们展示了IVL允许比线性化实现更便宜的实现。特别是,我们展示了来自单写多读寄存器的任何无等待线性批处理计数器的更新操作的步复杂度的下界Ω (n),这比我们的O(1) IVL实现更昂贵。
Intermediate Value Linearizability: A Quantitative Correctness Criterion
Big data processing systems often employ batched updates and data sketches to estimate certain properties of large data. For example, a CountMin sketch approximates the frequencies at which elements occur in a data stream, and a batched counter counts events in batches. This article focuses on correctness criteria for concurrent implementations of such objects. Specifically, we consider quantitative objects, whose return values are from an ordered domain, with a particular emphasis on (ε,δ)-bounded objects that estimate a numerical quantity with an error of at most ε with probability at least 1 - δ.
The de facto correctness criterion for concurrent objects is linearizability. Intuitively, under linearizability, when a read overlaps an update, it must return the object’s value either before the update or after it. Consider, for example, a single batched increment operation that counts three new events, bumping a batched counter’s value from 7 to 10. In a linearizable implementation of the counter, a read overlapping this update must return either 7 or 10. We observe, however, that in typical use cases, any intermediate value between 7 and 10 would also be acceptable. To capture this additional degree of freedom, we propose Intermediate Value Linearizability (IVL), a new correctness criterion that relaxes linearizability to allow returning intermediate values, for instance, 8 in the example above. Roughly speaking, IVL allows reads to return any value that is bounded between two return values that are legal under linearizability.
A key feature of IVL is that we can prove that concurrent IVL implementations of (ε,δ)-bounded objects are themselves (ε,δ)-bounded. To illustrate the power of this result, we give a straightforward and efficient concurrent implementation of an (ε,δ)-bounded CountMin sketch, which is IVL (albeit not linearizable).
We present four examples for IVL objects, each showcasing a different way of using IVL. The first is a simple wait-free IVL batched counter, with O(1) step complexity for update. The next considers an (ε,δ)-bounded CountMin sketch and further shows how to relax IVL using the notion of r-relaxation. Our third example is a non-atomic iterator over a data structure. In this example, we augment the data structure with an auxiliary history variable state that includes “tombstones” for items deleted from the data structure. Here, IVL semantics are required at the augmented level. Finally, using a priority queue, we show that some objects require IVL to be paired with other correctness criteria; indeed, a natural correctness notion for a concurrent priority queue is IVL coupled with sequential consistency.
Last, we show that IVL allows for inherently cheaper implementations than linearizable ones. In particular, we show a lower bound of Ω (n) on the step complexity of the update operation of any wait-free linearizable batched counter from single-writer multi-reader registers, which is more expensive than our O(1) IVL implementation.
期刊介绍:
The best indicator of the scope of the journal is provided by the areas covered by its Editorial Board. These areas change from time to time, as the field evolves. The following areas are currently covered by a member of the Editorial Board: Algorithms and Combinatorial Optimization; Algorithms and Data Structures; Algorithms, Combinatorial Optimization, and Games; Artificial Intelligence; Complexity Theory; Computational Biology; Computational Geometry; Computer Graphics and Computer Vision; Computer-Aided Verification; Cryptography and Security; Cyber-Physical, Embedded, and Real-Time Systems; Database Systems and Theory; Distributed Computing; Economics and Computation; Information Theory; Logic and Computation; Logic, Algorithms, and Complexity; Machine Learning and Computational Learning Theory; Networking; Parallel Computing and Architecture; Programming Languages; Quantum Computing; Randomized Algorithms and Probabilistic Analysis of Algorithms; Scientific Computing and High Performance Computing; Software Engineering; Web Algorithms and Data Mining