{"title":"提供块级及时恢复到任何时间点的高效索引和检索方法","authors":"Yonghong Sheng, Dan Xu, Dongsheng Wang","doi":"10.1109/NAS.2010.63","DOIUrl":null,"url":null,"abstract":"Block-level continuous data protection (CDP) logs every disk write operation so that the disk can be rolled back to any arbitrary point-in-time within a time window. For each update operation is time stamped and logged, the indexing for such huge amounts of records is an important and challenging problem. Unfortunately, the conventional indexing methods can not efficiently record large numbers of versions and support instant “time-travel” types of queries in CDP. In this paper, we present an effective indexing method providing timely recovery to any point-in-time in comprehensive versioning systems, called the Hierarchical Spatial-Temporal Indexing Method (HSTIM). The basic principle of HSTIM is to partition the time domain and the production storage LBAs into time slice and segments respectively according to update frequency of disk IOs, and build separate index file for each segment. In order to meet the demands of instant view of history data, the metadata of production storage is independently indexed. For long-time history data retrieval requirements, index snapshot is introduced in HSTIM to reduce the retrieval time. Another distinctive feature of HSTIM is its incremental retrieval method, which achieves high query performance at time point t + t if neighboring time point t is queried previously. The paper compares HSTIM with traditional B+-tree and multi-version B-tree (MVBT) index in many aspects. Experiments with real workload IO trace files show that HSTIM can locate history data within 8.05 seconds for recovery point of 48 hours, while B+-tree consumes 24.04 seconds. If the index snapshot is applied, HSTIM can reduce such retrieval time within 3 seconds.","PeriodicalId":284549,"journal":{"name":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","volume":"302 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A High Effective Indexing and Retrieval Method Providing Block-Level Timely Recovery to Any Point-in-Time\",\"authors\":\"Yonghong Sheng, Dan Xu, Dongsheng Wang\",\"doi\":\"10.1109/NAS.2010.63\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Block-level continuous data protection (CDP) logs every disk write operation so that the disk can be rolled back to any arbitrary point-in-time within a time window. For each update operation is time stamped and logged, the indexing for such huge amounts of records is an important and challenging problem. Unfortunately, the conventional indexing methods can not efficiently record large numbers of versions and support instant “time-travel” types of queries in CDP. In this paper, we present an effective indexing method providing timely recovery to any point-in-time in comprehensive versioning systems, called the Hierarchical Spatial-Temporal Indexing Method (HSTIM). The basic principle of HSTIM is to partition the time domain and the production storage LBAs into time slice and segments respectively according to update frequency of disk IOs, and build separate index file for each segment. In order to meet the demands of instant view of history data, the metadata of production storage is independently indexed. For long-time history data retrieval requirements, index snapshot is introduced in HSTIM to reduce the retrieval time. Another distinctive feature of HSTIM is its incremental retrieval method, which achieves high query performance at time point t + t if neighboring time point t is queried previously. The paper compares HSTIM with traditional B+-tree and multi-version B-tree (MVBT) index in many aspects. Experiments with real workload IO trace files show that HSTIM can locate history data within 8.05 seconds for recovery point of 48 hours, while B+-tree consumes 24.04 seconds. If the index snapshot is applied, HSTIM can reduce such retrieval time within 3 seconds.\",\"PeriodicalId\":284549,\"journal\":{\"name\":\"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage\",\"volume\":\"302 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NAS.2010.63\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Fifth International Conference on Networking, Architecture, and Storage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2010.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
摘要
CDP (Block-level continuous data protection)记录磁盘的每次写操作,以便在一个时间窗口内将磁盘回滚到任意时间点。由于每个更新操作都有时间戳和日志,因此对如此大量的记录进行索引是一个重要且具有挑战性的问题。遗憾的是,传统的索引方法不能有效地记录大量的版本,也不能支持CDP中即时的“时间旅行”类型的查询。在本文中,我们提出了一种有效的索引方法,可以在综合版本控制系统中及时恢复到任何时间点,称为分层时空索引方法(HSTIM)。HSTIM的基本原理是根据磁盘io的更新频率,将时域和生产存储LBAs分别划分为时间片和时间段,并为每个时间段构建单独的索引文件。为了满足即时查看历史数据的需求,生产存储的元数据被独立索引。针对长时间的历史数据检索需求,在HSTIM中引入了索引快照来减少检索时间。HSTIM的另一个显著特点是它的增量检索方法,如果之前查询相邻的时间点t,则在时间点t + t处获得较高的查询性能。本文将HSTIM与传统B+树和多版本B-树(MVBT)索引进行了多方面的比较。对真实工作负载IO跟踪文件的实验表明,对于48小时的恢复点,HSTIM可以在8.05秒内找到历史数据,而B+-tree则需要24.04秒。如果应用索引快照,HSTIM可以在3秒内减少这种检索时间。
A High Effective Indexing and Retrieval Method Providing Block-Level Timely Recovery to Any Point-in-Time
Block-level continuous data protection (CDP) logs every disk write operation so that the disk can be rolled back to any arbitrary point-in-time within a time window. For each update operation is time stamped and logged, the indexing for such huge amounts of records is an important and challenging problem. Unfortunately, the conventional indexing methods can not efficiently record large numbers of versions and support instant “time-travel” types of queries in CDP. In this paper, we present an effective indexing method providing timely recovery to any point-in-time in comprehensive versioning systems, called the Hierarchical Spatial-Temporal Indexing Method (HSTIM). The basic principle of HSTIM is to partition the time domain and the production storage LBAs into time slice and segments respectively according to update frequency of disk IOs, and build separate index file for each segment. In order to meet the demands of instant view of history data, the metadata of production storage is independently indexed. For long-time history data retrieval requirements, index snapshot is introduced in HSTIM to reduce the retrieval time. Another distinctive feature of HSTIM is its incremental retrieval method, which achieves high query performance at time point t + t if neighboring time point t is queried previously. The paper compares HSTIM with traditional B+-tree and multi-version B-tree (MVBT) index in many aspects. Experiments with real workload IO trace files show that HSTIM can locate history data within 8.05 seconds for recovery point of 48 hours, while B+-tree consumes 24.04 seconds. If the index snapshot is applied, HSTIM can reduce such retrieval time within 3 seconds.