Improving the performance of the BDB system by changing the query processing temporary file structures

ACM-SE 28 Pub Date : 1990-04-01 DOI:10.1145/98949.99028

R. Trueblood, P. Lai

{"title":"Improving the performance of the BDB system by changing the query processing temporary file structures","authors":"R. Trueblood, P. Lai","doi":"10.1145/98949.99028","DOIUrl":null,"url":null,"abstract":"A performance study of the BDB system, a highly modular relational database management system, is conducted on the effects caused by changing the temporary file access method of the query processor from the direct access method to the record sequential access method. A built-in software monitor is used to measure query processing response lime. Experimental results show that the performance can be improved for selective type queries. For join type queries, two join algorithms, \"nested-loop\" and \"sori-mcrge,\" are investigated. Interestingly, the direct access method performed better for the nested-loop join implementation while the record sequential access method performed belter for the sortmerge implementation. The observed outcomes of these experiments are reported and are discussed in [1]. The study is conducted on non-indexed data. The BDB system is benchmarked by conducting a series of controlled experiments. The experiments used three relations, each containing two attributes. The size of the relations varied, collectively, from 1, 100, 500, 1,000, 1,500, 7,500, and 10,000 tuples. For the record sequential access Files, the buffer size is varied from 3K, 6K, 18K, and 30K. Changing the buffer size from 3K up to 30K resulted in only a 1 to 2 percent improvement. This small percentage of improvement seems to contradict the well known fact that larger buffers reduce I/O costs. Some possibile reasons for this contradiction are that the physical data path of the microcomputer is loo small to allow large buffer efficiency and that the operating system reads and/or writes disk sectors which are of fixed size. A set of nine test queries is used to obtain response time measurements from the query processor. Briefly, some the queries selected all of the tuples of a relation, some of selected only one tuple, some selected half of the tuples, and others joined the relations. The results of investigating whether the record sequential access method is belter than the direct access method for supporting temporary files created during query processing have yielded several interesting Findings. First, the investigation revealed some inefficient code such as rereading data already in the buffer area and excessive copying of data from one buffer area to another. When improved, the performance is enhanced by about 85%. Second, the record sequential access method offered a 0-14% improvement over the direct access method for selective type queries. Specifically, for one tuple there was no improvement, for selecting one-half of the relation there was a 10% improvement, and for selecting the whole relation there was a 14% improvement. Two algorithms for the join were investigated. The nested-loop, which handles the many-to-many mapping, performed belter by about 50% when the direct access method is used. The sort-merge algorithm, which handles the one-to-many mapping, performed belter by about 40% when the record sequential access method is used. A possible reason for this is that the nested-loop procedure requires the rereading of one of the temporary files. In order to reread a record sequential File, the file must be closed and then reopened. For direct access files, the File remains open, and the record counter is reset to the first record in the file which eliminates the need to close and reopen the file. Thus, both join algorithms should be included in the BDB system and used appropriately depending on the mapping relationship between the two files being joined. REFERENCES 1. Trueblood, R. and Lai, P., \"Improving the Performance of the BDB System by Changing the Temporary File Structure used for Query Processing,\" Tech. Report No. TR90003, Department of Computer Science, University of South Carolina, Columbia, South Carolina 29208 (1990). Permission to copy without fee all or part o f this material is granted provided that the copies ore not made or distributed for direct com mercial advantage, the ACM copyright notice and the title of the publication and its dale appear, and notice Is given that copying in by permission of the Association for Computing Machinery. To copy olhcrwlnc, or to icpublish, rci|iiircn o fee and/or s|>ccific per","PeriodicalId":409883,"journal":{"name":"ACM-SE 28","volume":"577 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM-SE 28","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/98949.99028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A performance study of the BDB system, a highly modular relational database management system, is conducted on the effects caused by changing the temporary file access method of the query processor from the direct access method to the record sequential access method. A built-in software monitor is used to measure query processing response lime. Experimental results show that the performance can be improved for selective type queries. For join type queries, two join algorithms, "nested-loop" and "sori-mcrge," are investigated. Interestingly, the direct access method performed better for the nested-loop join implementation while the record sequential access method performed belter for the sortmerge implementation. The observed outcomes of these experiments are reported and are discussed in [1]. The study is conducted on non-indexed data. The BDB system is benchmarked by conducting a series of controlled experiments. The experiments used three relations, each containing two attributes. The size of the relations varied, collectively, from 1, 100, 500, 1,000, 1,500, 7,500, and 10,000 tuples. For the record sequential access Files, the buffer size is varied from 3K, 6K, 18K, and 30K. Changing the buffer size from 3K up to 30K resulted in only a 1 to 2 percent improvement. This small percentage of improvement seems to contradict the well known fact that larger buffers reduce I/O costs. Some possibile reasons for this contradiction are that the physical data path of the microcomputer is loo small to allow large buffer efficiency and that the operating system reads and/or writes disk sectors which are of fixed size. A set of nine test queries is used to obtain response time measurements from the query processor. Briefly, some the queries selected all of the tuples of a relation, some of selected only one tuple, some selected half of the tuples, and others joined the relations. The results of investigating whether the record sequential access method is belter than the direct access method for supporting temporary files created during query processing have yielded several interesting Findings. First, the investigation revealed some inefficient code such as rereading data already in the buffer area and excessive copying of data from one buffer area to another. When improved, the performance is enhanced by about 85%. Second, the record sequential access method offered a 0-14% improvement over the direct access method for selective type queries. Specifically, for one tuple there was no improvement, for selecting one-half of the relation there was a 10% improvement, and for selecting the whole relation there was a 14% improvement. Two algorithms for the join were investigated. The nested-loop, which handles the many-to-many mapping, performed belter by about 50% when the direct access method is used. The sort-merge algorithm, which handles the one-to-many mapping, performed belter by about 40% when the record sequential access method is used. A possible reason for this is that the nested-loop procedure requires the rereading of one of the temporary files. In order to reread a record sequential File, the file must be closed and then reopened. For direct access files, the File remains open, and the record counter is reset to the first record in the file which eliminates the need to close and reopen the file. Thus, both join algorithms should be included in the BDB system and used appropriately depending on the mapping relationship between the two files being joined. REFERENCES 1. Trueblood, R. and Lai, P., "Improving the Performance of the BDB System by Changing the Temporary File Structure used for Query Processing," Tech. Report No. TR90003, Department of Computer Science, University of South Carolina, Columbia, South Carolina 29208 (1990). Permission to copy without fee all or part o f this material is granted provided that the copies ore not made or distributed for direct com mercial advantage, the ACM copyright notice and the title of the publication and its dale appear, and notice Is given that copying in by permission of the Association for Computing Machinery. To copy olhcrwlnc, or to icpublish, rci|iiircn o fee and/or s|>ccific per

查看原文本刊更多论文

通过改变查询处理临时文件结构来提高BDB系统的性能

对高度模块化的关系数据库管理系统BDB系统的性能进行了研究，将查询处理器的临时文件访问方式由直接访问方式改为记录顺序访问方式所带来的影响。内置的软件监视器用于测量查询处理响应时间。实验结果表明，选择类型查询可以提高性能。对于连接类型查询，研究了两种连接算法，“嵌套循环”和“sori- mmge”。有趣的是，直接访问方法在嵌套循环连接实现中表现更好，而记录顺序访问方法在排序合并实现中表现更好。本文报道了这些实验的观察结果，并在b[1]中进行了讨论。该研究是在非索引数据上进行的。BDB系统通过一系列的对照实验进行基准测试。实验使用了三个关系，每个关系包含两个属性。关系的大小各不相同，从1,100、500、1,000、1,500、7,500和10,000元组不等。对于记录顺序访问文件，缓冲区大小从3K、6K、18K和30K不等。将缓冲区大小从3K更改为30K只会带来1%到2%的改进。这个小百分比的改进似乎与众所周知的事实相矛盾，即更大的缓冲区可以降低I/O成本。造成这种矛盾的一些可能的原因是:微型计算机的物理数据路径太小，不能允许很大的缓冲区效率，而且操作系统读取和/或写入的磁盘扇区是固定大小的。一组9个测试查询用于从查询处理器获取响应时间度量值。简单地说，有些查询选择了关系的所有元组，有些查询只选择了一个元组，有些查询选择了一半元组，还有一些查询加入了关系。对于支持查询处理期间创建的临时文件，记录顺序访问方法是否优于直接访问方法的调查结果产生了几个有趣的发现。首先，调查揭示了一些效率低下的代码，例如重新读取缓冲区中的数据以及将数据从一个缓冲区复制到另一个缓冲区。改进后，性能提高了约85%。其次，对于选择性类型查询，记录顺序访问方法比直接访问方法提供了0-14%的改进。具体来说，对于一个元组没有改进，对于选择关系的一半有10%的改进，对于选择整个关系有14%的改进。研究了两种连接算法。当使用直接访问方法时，处理多对多映射的嵌套循环的执行效率提高了约50%。当使用记录顺序访问方法时，处理一对多映射的排序合并算法的性能提高了约40%。一个可能的原因是嵌套循环过程需要重新读取其中一个临时文件。为了重新读取记录顺序文件，必须先关闭该文件，然后再重新打开。对于直接访问文件，文件保持打开状态，记录计数器重置为文件中的第一条记录，从而消除了关闭和重新打开文件的需要。因此，两种连接算法都应该包含在BDB系统中，并根据要连接的两个文件之间的映射关系适当地使用。引用1。Trueblood, R.和Lai, P.，“通过改变用于查询处理的临时文件结构来提高BDB系统的性能”，技术报告编号:TR90003，南卡罗来纳大学计算机科学系，哥伦比亚，南卡罗来纳州29208(1990)。允许免费复制本材料的全部或部分内容，前提是这些副本不是为直接商业利益而制作或分发的，必须出现ACM版权声明和出版物的标题及其名称，并注明复制是在计算机协会的许可下进行的。若要复制olhwrnnc，或要发布，请使用|或|或>，以供参考

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM-SE 28

自引率

0.00%

发文量