{"title":"更新索引,以执行快速正则表达式搜索","authors":"Igor Andrianov, A. Grigorieva, T. Akhmetov","doi":"10.1145/3388984.3390877","DOIUrl":null,"url":null,"abstract":"Regular expression search is widely used, including in databases. For example, the LIKE operator was included in the SQL standard about thirty years ago. However, the types of indexes commonly used in DBMS are extremely limited for speeding up regular expression search: most of these queries require a full scan of all data. One of the most interesting approaches to developing a specialized index is described in the article [1]. Its authors suggested using a certain subset of substrings of variable - length input data-multigrams-as index keys. In this article, we propose changes to the structure and algorithm for constructing such an index, which allows us to achieve two goals. First, the index becomes applicable to speed up a broader class of queries. Second, the proposed changes made it possible to update the index. We also developed and tested an algorithm for updating the index when inserting new records into the database. This algorithm allows you to get two orders of magnitude lower time for updating the index compared to its complete reconstruction.","PeriodicalId":288007,"journal":{"name":"Proceedings of the III International Scientific and Practical Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Updated index to perform fast regular expression search\",\"authors\":\"Igor Andrianov, A. Grigorieva, T. Akhmetov\",\"doi\":\"10.1145/3388984.3390877\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regular expression search is widely used, including in databases. For example, the LIKE operator was included in the SQL standard about thirty years ago. However, the types of indexes commonly used in DBMS are extremely limited for speeding up regular expression search: most of these queries require a full scan of all data. One of the most interesting approaches to developing a specialized index is described in the article [1]. Its authors suggested using a certain subset of substrings of variable - length input data-multigrams-as index keys. In this article, we propose changes to the structure and algorithm for constructing such an index, which allows us to achieve two goals. First, the index becomes applicable to speed up a broader class of queries. Second, the proposed changes made it possible to update the index. We also developed and tested an algorithm for updating the index when inserting new records into the database. This algorithm allows you to get two orders of magnitude lower time for updating the index compared to its complete reconstruction.\",\"PeriodicalId\":288007,\"journal\":{\"name\":\"Proceedings of the III International Scientific and Practical Conference\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the III International Scientific and Practical Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3388984.3390877\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the III International Scientific and Practical Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388984.3390877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Updated index to perform fast regular expression search
Regular expression search is widely used, including in databases. For example, the LIKE operator was included in the SQL standard about thirty years ago. However, the types of indexes commonly used in DBMS are extremely limited for speeding up regular expression search: most of these queries require a full scan of all data. One of the most interesting approaches to developing a specialized index is described in the article [1]. Its authors suggested using a certain subset of substrings of variable - length input data-multigrams-as index keys. In this article, we propose changes to the structure and algorithm for constructing such an index, which allows us to achieve two goals. First, the index becomes applicable to speed up a broader class of queries. Second, the proposed changes made it possible to update the index. We also developed and tested an algorithm for updating the index when inserting new records into the database. This algorithm allows you to get two orders of magnitude lower time for updating the index compared to its complete reconstruction.