具有复杂文档编辑的slp表示文档数据库的查询评估

Markus L. Schmid, Nicole Schweikardt
{"title":"具有复杂文档编辑的slp表示文档数据库的查询评估","authors":"Markus L. Schmid, Nicole Schweikardt","doi":"10.1145/3517804.3524158","DOIUrl":null,"url":null,"abstract":"It is known that the query result of a regular spanner over a single document D can be enumerated after O(|D|) preprocessing and with constant delay in data complexity (Florenzano et al., ACM TODS 2020, Amarilli et al., ACM TODS 2021). It has been shown (Schmid and Schweikardt, PODS'21) that if the document is represented by a straight-line program (SLP) S, then enumeration is possible with a delay of O(log |D|), but with preprocessing that is linear in |S| (which, in the best case, is logarithmic in |D|). Hence, this compressed setting allows for spanner evaluation in sub-linear time, i.e., with logarithmic upper bounds for preprocessing and delay, if the document is highly-compressible. In this work, we extend these results to the dynamic setting. We consider a document database DDB = D1, D2, ..., Dm that is represented by an SLP SDDB, and that supports regular spanners M1, M2, ..., Mk (meaning that we have data structures at our disposal that allow O(log |Di|)-delay enumeration of the result of spanner Mj on document Di). Then we can perform an update by manipulating the existing documents of DDB by a sequence of text-editing operations commonly found in text-editors (like copy and paste, deleting, or copying factors, concatenating documents etc.), and add the thus constructed document to the database. Such an operation is called complex document editing and is given by an expression φ in a suitable algebra. Moreover, after this operation, the document database still supports all the regular spanners M1, ..., Mk. The total time required for such an update is O(k |φ| log d), where d is the maximum length of any intermediate document constructed in the complex document editing described by φ. We stress the fact that the size |SDDB| of the SLP (which upper bounds the preprocessing in the static case) is potentially logarithmic in the data, but generally depends on the compressibility of the documents (in the worst case, it is even linear in the data). In contrast to that, we can guarantee that the dependency on the data of our updates is logarithmic regardless of the actual compression achieved by the SLP. In particular, any such update performed by complex document editing adds documents whose length may be exponentially larger than the time needed for performing such an update. Our approach hinges on balancing properties of SLPs, and for our updates it is vital to manipulate the SLP that represents the database in such a way that these balancing properties are maintained.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Query Evaluation over SLP-Represented Document Databases with Complex Document Editing\",\"authors\":\"Markus L. Schmid, Nicole Schweikardt\",\"doi\":\"10.1145/3517804.3524158\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is known that the query result of a regular spanner over a single document D can be enumerated after O(|D|) preprocessing and with constant delay in data complexity (Florenzano et al., ACM TODS 2020, Amarilli et al., ACM TODS 2021). It has been shown (Schmid and Schweikardt, PODS'21) that if the document is represented by a straight-line program (SLP) S, then enumeration is possible with a delay of O(log |D|), but with preprocessing that is linear in |S| (which, in the best case, is logarithmic in |D|). Hence, this compressed setting allows for spanner evaluation in sub-linear time, i.e., with logarithmic upper bounds for preprocessing and delay, if the document is highly-compressible. In this work, we extend these results to the dynamic setting. We consider a document database DDB = D1, D2, ..., Dm that is represented by an SLP SDDB, and that supports regular spanners M1, M2, ..., Mk (meaning that we have data structures at our disposal that allow O(log |Di|)-delay enumeration of the result of spanner Mj on document Di). Then we can perform an update by manipulating the existing documents of DDB by a sequence of text-editing operations commonly found in text-editors (like copy and paste, deleting, or copying factors, concatenating documents etc.), and add the thus constructed document to the database. Such an operation is called complex document editing and is given by an expression φ in a suitable algebra. Moreover, after this operation, the document database still supports all the regular spanners M1, ..., Mk. The total time required for such an update is O(k |φ| log d), where d is the maximum length of any intermediate document constructed in the complex document editing described by φ. We stress the fact that the size |SDDB| of the SLP (which upper bounds the preprocessing in the static case) is potentially logarithmic in the data, but generally depends on the compressibility of the documents (in the worst case, it is even linear in the data). In contrast to that, we can guarantee that the dependency on the data of our updates is logarithmic regardless of the actual compression achieved by the SLP. In particular, any such update performed by complex document editing adds documents whose length may be exponentially larger than the time needed for performing such an update. Our approach hinges on balancing properties of SLPs, and for our updates it is vital to manipulate the SLP that represents the database in such a way that these balancing properties are maintained.\",\"PeriodicalId\":230606,\"journal\":{\"name\":\"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3517804.3524158\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3517804.3524158","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

众所周知,常规扳手对单个文档D的查询结果可以在O(|D|)预处理后,在数据复杂度上具有恒定的延迟(Florenzano等人,ACM TODS 2020, Amarilli等人,ACM TODS 2021)。已经证明(Schmid和Schweikardt, PODS'21),如果文档由直线程序(SLP) S表示,则枚举可能具有O(log |D|)的延迟,但预处理在|S|中是线性的(在最好的情况下,在|D|中是对数的)。因此,如果文档是高度可压缩的,这个压缩设置允许在次线性时间内对扳手进行评估,即预处理和延迟的对数上限。在这项工作中,我们将这些结果扩展到动态设置。我们考虑一个文档数据库DDB = D1, D2,…, Dm,由SLP SDDB表示,支持常规扳手M1, M2,…Mk(这意味着我们可以使用允许0 (log |Di|)延迟枚举扳手Mj在文档Di上的结果的数据结构)。然后,我们可以通过文本编辑器中常见的一系列文本编辑操作(如复制和粘贴、删除或复制因子、连接文档等)操作DDB的现有文档来执行更新,并将由此构造的文档添加到数据库中。这样的操作称为复杂文档编辑,用合适的代数表达式φ表示。而且,在此操作之后,文档数据库仍然支持所有常规扳手M1,…这种更新所需的总时间为O(k |φ| log d),其中d是在φ描述的复杂文档编辑中构建的任何中间文档的最大长度。我们强调这样一个事实,即SLP的大小(在静态情况下是预处理的上限)在数据中可能是对数的,但通常取决于文档的可压缩性(在最坏的情况下,它在数据中甚至是线性的)。与此相反,我们可以保证对更新数据的依赖是对数的,而不管SLP实现的实际压缩是多少。特别是,由复杂文档编辑执行的任何此类更新都会添加文档,这些文档的长度可能比执行此类更新所需的时间大得多。我们的方法取决于SLP的平衡属性,对于我们的更新来说,以维护这些平衡属性的方式操作代表数据库的SLP是至关重要的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Query Evaluation over SLP-Represented Document Databases with Complex Document Editing
It is known that the query result of a regular spanner over a single document D can be enumerated after O(|D|) preprocessing and with constant delay in data complexity (Florenzano et al., ACM TODS 2020, Amarilli et al., ACM TODS 2021). It has been shown (Schmid and Schweikardt, PODS'21) that if the document is represented by a straight-line program (SLP) S, then enumeration is possible with a delay of O(log |D|), but with preprocessing that is linear in |S| (which, in the best case, is logarithmic in |D|). Hence, this compressed setting allows for spanner evaluation in sub-linear time, i.e., with logarithmic upper bounds for preprocessing and delay, if the document is highly-compressible. In this work, we extend these results to the dynamic setting. We consider a document database DDB = D1, D2, ..., Dm that is represented by an SLP SDDB, and that supports regular spanners M1, M2, ..., Mk (meaning that we have data structures at our disposal that allow O(log |Di|)-delay enumeration of the result of spanner Mj on document Di). Then we can perform an update by manipulating the existing documents of DDB by a sequence of text-editing operations commonly found in text-editors (like copy and paste, deleting, or copying factors, concatenating documents etc.), and add the thus constructed document to the database. Such an operation is called complex document editing and is given by an expression φ in a suitable algebra. Moreover, after this operation, the document database still supports all the regular spanners M1, ..., Mk. The total time required for such an update is O(k |φ| log d), where d is the maximum length of any intermediate document constructed in the complex document editing described by φ. We stress the fact that the size |SDDB| of the SLP (which upper bounds the preprocessing in the static case) is potentially logarithmic in the data, but generally depends on the compressibility of the documents (in the worst case, it is even linear in the data). In contrast to that, we can guarantee that the dependency on the data of our updates is logarithmic regardless of the actual compression achieved by the SLP. In particular, any such update performed by complex document editing adds documents whose length may be exponentially larger than the time needed for performing such an update. Our approach hinges on balancing properties of SLPs, and for our updates it is vital to manipulate the SLP that represents the database in such a way that these balancing properties are maintained.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信