Online bit flip detection for in-memory B-trees on unreliable hardware

Till Kolditz, T. Kissinger, B. Schlegel, Dirk Habich, Wolfgang Lehner
{"title":"Online bit flip detection for in-memory B-trees on unreliable hardware","authors":"Till Kolditz, T. Kissinger, B. Schlegel, Dirk Habich, Wolfgang Lehner","doi":"10.1145/2619228.2619233","DOIUrl":null,"url":null,"abstract":"Hardware vendors constantly decrease the feature sizes of integrated circuits to obtain better performance and energy efficiency. Due to cosmic rays, low voltage or heat dissipation, hardware -- both processors and memory -- becomes more and more unreliable as the error rate increases. From a database perspective bit flip errors in main memory will become a major challenge for modern in-memory database systems, which keep all their enterprise data in volatile, unreliable main memory. Although existing hardware error control techniques like ECC-DRAM are able to detect and correct memory errors, their detection and correction capabilities are limited. Moreover, hardware error correction faces major drawbacks in terms of acquisition costs, additional memory utilization, and latency. In this paper, we argue that slightly increasing data redundancy at the right places by incorporating context knowledge already increases error detection significantly. We use the B-Tree -- as a widespread index structure -- as an example and propose various techniques for online error detection and thus increase its overall reliability. In our experiments, we found that our techniques can detect more errors in less time on commodity hardware compared to non-resilient B-Trees running in an ECC-DRAM environment. Our techniques can further be easily adapted for other data structures and are a first step in the direction of resilient database systems which can cope with unreliable hardware.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"126 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2619228.2619233","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Hardware vendors constantly decrease the feature sizes of integrated circuits to obtain better performance and energy efficiency. Due to cosmic rays, low voltage or heat dissipation, hardware -- both processors and memory -- becomes more and more unreliable as the error rate increases. From a database perspective bit flip errors in main memory will become a major challenge for modern in-memory database systems, which keep all their enterprise data in volatile, unreliable main memory. Although existing hardware error control techniques like ECC-DRAM are able to detect and correct memory errors, their detection and correction capabilities are limited. Moreover, hardware error correction faces major drawbacks in terms of acquisition costs, additional memory utilization, and latency. In this paper, we argue that slightly increasing data redundancy at the right places by incorporating context knowledge already increases error detection significantly. We use the B-Tree -- as a widespread index structure -- as an example and propose various techniques for online error detection and thus increase its overall reliability. In our experiments, we found that our techniques can detect more errors in less time on commodity hardware compared to non-resilient B-Trees running in an ECC-DRAM environment. Our techniques can further be easily adapted for other data structures and are a first step in the direction of resilient database systems which can cope with unreliable hardware.
不可靠硬件上内存b树的在线位翻转检测
硬件厂商不断减小集成电路的特征尺寸,以获得更好的性能和能源效率。由于宇宙射线、低电压或散热,随着错误率的增加,硬件(包括处理器和内存)变得越来越不可靠。从数据库的角度来看,主存中的位翻转错误将成为现代内存数据库系统的主要挑战,这些系统将所有企业数据保存在易失的、不可靠的主存中。虽然现有的硬件错误控制技术,如ECC-DRAM能够检测和纠正内存错误,但它们的检测和纠正能力是有限的。此外,硬件纠错在获取成本、额外内存利用和延迟方面面临主要缺点。在本文中,我们认为通过结合上下文知识在适当的地方略微增加数据冗余已经显著提高了错误检测。我们使用b树作为一个广泛的索引结构作为一个例子,并提出了各种在线错误检测技术,从而提高了它的整体可靠性。在我们的实验中,我们发现,与在ECC-DRAM环境中运行的非弹性b - tree相比,我们的技术可以在更短的时间内在商用硬件上检测到更多错误。我们的技术可以很容易地进一步适用于其他数据结构,并且是弹性数据库系统的第一步,可以应对不可靠的硬件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信