更新索引,以执行快速正则表达式搜索

Igor Andrianov, A. Grigorieva, T. Akhmetov
{"title":"更新索引,以执行快速正则表达式搜索","authors":"Igor Andrianov, A. Grigorieva, T. Akhmetov","doi":"10.1145/3388984.3390877","DOIUrl":null,"url":null,"abstract":"Regular expression search is widely used, including in databases. For example, the LIKE operator was included in the SQL standard about thirty years ago. However, the types of indexes commonly used in DBMS are extremely limited for speeding up regular expression search: most of these queries require a full scan of all data. One of the most interesting approaches to developing a specialized index is described in the article [1]. Its authors suggested using a certain subset of substrings of variable - length input data-multigrams-as index keys. In this article, we propose changes to the structure and algorithm for constructing such an index, which allows us to achieve two goals. First, the index becomes applicable to speed up a broader class of queries. Second, the proposed changes made it possible to update the index. We also developed and tested an algorithm for updating the index when inserting new records into the database. This algorithm allows you to get two orders of magnitude lower time for updating the index compared to its complete reconstruction.","PeriodicalId":288007,"journal":{"name":"Proceedings of the III International Scientific and Practical Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Updated index to perform fast regular expression search\",\"authors\":\"Igor Andrianov, A. Grigorieva, T. Akhmetov\",\"doi\":\"10.1145/3388984.3390877\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regular expression search is widely used, including in databases. For example, the LIKE operator was included in the SQL standard about thirty years ago. However, the types of indexes commonly used in DBMS are extremely limited for speeding up regular expression search: most of these queries require a full scan of all data. One of the most interesting approaches to developing a specialized index is described in the article [1]. Its authors suggested using a certain subset of substrings of variable - length input data-multigrams-as index keys. In this article, we propose changes to the structure and algorithm for constructing such an index, which allows us to achieve two goals. First, the index becomes applicable to speed up a broader class of queries. Second, the proposed changes made it possible to update the index. We also developed and tested an algorithm for updating the index when inserting new records into the database. This algorithm allows you to get two orders of magnitude lower time for updating the index compared to its complete reconstruction.\",\"PeriodicalId\":288007,\"journal\":{\"name\":\"Proceedings of the III International Scientific and Practical Conference\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the III International Scientific and Practical Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3388984.3390877\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the III International Scientific and Practical Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388984.3390877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

正则表达式搜索被广泛使用,包括在数据库中。例如,大约30年前,LIKE操作符就包含在SQL标准中。然而,在DBMS中常用的索引类型在加速正则表达式搜索方面是非常有限的:大多数这些查询都需要对所有数据进行全面扫描。文章[1]描述了开发专门索引的最有趣的方法之一。它的作者建议使用可变长度输入数据-多图-的子字符串的特定子集作为索引键。在本文中,我们建议修改构造这样一个索引的结构和算法,从而实现两个目标。首先,索引变得适用于加速更广泛的查询类别。第二,拟议的修改使更新索引成为可能。我们还开发并测试了一种算法,用于在向数据库插入新记录时更新索引。与索引的完全重建相比,该算法使更新索引的时间减少了两个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Updated index to perform fast regular expression search
Regular expression search is widely used, including in databases. For example, the LIKE operator was included in the SQL standard about thirty years ago. However, the types of indexes commonly used in DBMS are extremely limited for speeding up regular expression search: most of these queries require a full scan of all data. One of the most interesting approaches to developing a specialized index is described in the article [1]. Its authors suggested using a certain subset of substrings of variable - length input data-multigrams-as index keys. In this article, we propose changes to the structure and algorithm for constructing such an index, which allows us to achieve two goals. First, the index becomes applicable to speed up a broader class of queries. Second, the proposed changes made it possible to update the index. We also developed and tested an algorithm for updating the index when inserting new records into the database. This algorithm allows you to get two orders of magnitude lower time for updating the index compared to its complete reconstruction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信