{"title":"Coarse indices for a tape-based data warehouse","authors":"T. Johnson","doi":"10.1109/ICDE.1998.655781","DOIUrl":null,"url":null,"abstract":"Data warehouses allow users to make sense of large quantities of detail data. While most queries can be answered through summary data, some queries can only be answered by accessing the detail data. It is usually not cost-effective to store terabytes of detail data online; instead, the detail data is stored on tape. The problem we address in this paper is how to index tape-based detail data. Conventional indices on tens of terabytes of data can require terabytes of storage themselves. We propose the use of coarse indices for tape-based detail data. Instead of specifying all locations of a record containing a particular key, the coarse index specifies whether or not a region of tape contains at least one record with a particular key value. Our proposal is based on the observation that while long tape seeks are fast, short tape seeks are slow. Therefore, indices that point to the exact record location on tape do not provide performance benefits to justify the cost of their storage. A few bits pointing to an appropriate location are enough. In this paper, we present the design of such a coarse index, and provide fast algorithms for its updating and querying. Our experiments on a large data set taken from an existing data warehouse show that using compressed bitmap indices offer an order-of-magnitude reduction in index size, permitting the online storage of the coarse indices. Analytical and simulation models of the time to fetch selected records from tape show that using coarse indices almost always improves reduces the total loading time as compared to using dense tape-based indices or to using no index at all.","PeriodicalId":264926,"journal":{"name":"Proceedings 14th International Conference on Data Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 14th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.1998.655781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Data warehouses allow users to make sense of large quantities of detail data. While most queries can be answered through summary data, some queries can only be answered by accessing the detail data. It is usually not cost-effective to store terabytes of detail data online; instead, the detail data is stored on tape. The problem we address in this paper is how to index tape-based detail data. Conventional indices on tens of terabytes of data can require terabytes of storage themselves. We propose the use of coarse indices for tape-based detail data. Instead of specifying all locations of a record containing a particular key, the coarse index specifies whether or not a region of tape contains at least one record with a particular key value. Our proposal is based on the observation that while long tape seeks are fast, short tape seeks are slow. Therefore, indices that point to the exact record location on tape do not provide performance benefits to justify the cost of their storage. A few bits pointing to an appropriate location are enough. In this paper, we present the design of such a coarse index, and provide fast algorithms for its updating and querying. Our experiments on a large data set taken from an existing data warehouse show that using compressed bitmap indices offer an order-of-magnitude reduction in index size, permitting the online storage of the coarse indices. Analytical and simulation models of the time to fetch selected records from tape show that using coarse indices almost always improves reduces the total loading time as compared to using dense tape-based indices or to using no index at all.