更新索引，以执行快速正则表达式搜索

Proceedings of the III International Scientific and Practical Conference Pub Date : 2020-03-19 DOI:10.1145/3388984.3390877

Igor Andrianov, A. Grigorieva, T. Akhmetov

{"title":"更新索引，以执行快速正则表达式搜索","authors":"Igor Andrianov, A. Grigorieva, T. Akhmetov","doi":"10.1145/3388984.3390877","DOIUrl":null,"url":null,"abstract":"Regular expression search is widely used, including in databases. For example, the LIKE operator was included in the SQL standard about thirty years ago. However, the types of indexes commonly used in DBMS are extremely limited for speeding up regular expression search: most of these queries require a full scan of all data. One of the most interesting approaches to developing a specialized index is described in the article [1]. Its authors suggested using a certain subset of substrings of variable - length input data-multigrams-as index keys. In this article, we propose changes to the structure and algorithm for constructing such an index, which allows us to achieve two goals. First, the index becomes applicable to speed up a broader class of queries. Second, the proposed changes made it possible to update the index. We also developed and tested an algorithm for updating the index when inserting new records into the database. This algorithm allows you to get two orders of magnitude lower time for updating the index compared to its complete reconstruction.","PeriodicalId":288007,"journal":{"name":"Proceedings of the III International Scientific and Practical Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Updated index to perform fast regular expression search\",\"authors\":\"Igor Andrianov, A. Grigorieva, T. Akhmetov\",\"doi\":\"10.1145/3388984.3390877\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regular expression search is widely used, including in databases. For example, the LIKE operator was included in the SQL standard about thirty years ago. However, the types of indexes commonly used in DBMS are extremely limited for speeding up regular expression search: most of these queries require a full scan of all data. One of the most interesting approaches to developing a specialized index is described in the article [1]. Its authors suggested using a certain subset of substrings of variable - length input data-multigrams-as index keys. In this article, we propose changes to the structure and algorithm for constructing such an index, which allows us to achieve two goals. First, the index becomes applicable to speed up a broader class of queries. Second, the proposed changes made it possible to update the index. We also developed and tested an algorithm for updating the index when inserting new records into the database. This algorithm allows you to get two orders of magnitude lower time for updating the index compared to its complete reconstruction.\",\"PeriodicalId\":288007,\"journal\":{\"name\":\"Proceedings of the III International Scientific and Practical Conference\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the III International Scientific and Practical Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3388984.3390877\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the III International Scientific and Practical Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388984.3390877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

正则表达式搜索被广泛使用，包括在数据库中。例如，大约30年前，LIKE操作符就包含在SQL标准中。然而，在DBMS中常用的索引类型在加速正则表达式搜索方面是非常有限的:大多数这些查询都需要对所有数据进行全面扫描。文章[1]描述了开发专门索引的最有趣的方法之一。它的作者建议使用可变长度输入数据-多图-的子字符串的特定子集作为索引键。在本文中，我们建议修改构造这样一个索引的结构和算法，从而实现两个目标。首先，索引变得适用于加速更广泛的查询类别。第二，拟议的修改使更新索引成为可能。我们还开发并测试了一种算法，用于在向数据库插入新记录时更新索引。与索引的完全重建相比，该算法使更新索引的时间减少了两个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Updated index to perform fast regular expression search

Regular expression search is widely used, including in databases. For example, the LIKE operator was included in the SQL standard about thirty years ago. However, the types of indexes commonly used in DBMS are extremely limited for speeding up regular expression search: most of these queries require a full scan of all data. One of the most interesting approaches to developing a specialized index is described in the article [1]. Its authors suggested using a certain subset of substrings of variable - length input data-multigrams-as index keys. In this article, we propose changes to the structure and algorithm for constructing such an index, which allows us to achieve two goals. First, the index becomes applicable to speed up a broader class of queries. Second, the proposed changes made it possible to update the index. We also developed and tested an algorithm for updating the index when inserting new records into the database. This algorithm allows you to get two orders of magnitude lower time for updating the index compared to its complete reconstruction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the III International Scientific and Practical Conference

自引率

0.00%

发文量