{"title":"具有复杂文档编辑的slp表示文档数据库的查询评估","authors":"Markus L. Schmid, Nicole Schweikardt","doi":"10.1145/3517804.3524158","DOIUrl":null,"url":null,"abstract":"It is known that the query result of a regular spanner over a single document D can be enumerated after O(|D|) preprocessing and with constant delay in data complexity (Florenzano et al., ACM TODS 2020, Amarilli et al., ACM TODS 2021). It has been shown (Schmid and Schweikardt, PODS'21) that if the document is represented by a straight-line program (SLP) S, then enumeration is possible with a delay of O(log |D|), but with preprocessing that is linear in |S| (which, in the best case, is logarithmic in |D|). Hence, this compressed setting allows for spanner evaluation in sub-linear time, i.e., with logarithmic upper bounds for preprocessing and delay, if the document is highly-compressible. In this work, we extend these results to the dynamic setting. We consider a document database DDB = D1, D2, ..., Dm that is represented by an SLP SDDB, and that supports regular spanners M1, M2, ..., Mk (meaning that we have data structures at our disposal that allow O(log |Di|)-delay enumeration of the result of spanner Mj on document Di). Then we can perform an update by manipulating the existing documents of DDB by a sequence of text-editing operations commonly found in text-editors (like copy and paste, deleting, or copying factors, concatenating documents etc.), and add the thus constructed document to the database. Such an operation is called complex document editing and is given by an expression φ in a suitable algebra. Moreover, after this operation, the document database still supports all the regular spanners M1, ..., Mk. The total time required for such an update is O(k |φ| log d), where d is the maximum length of any intermediate document constructed in the complex document editing described by φ. We stress the fact that the size |SDDB| of the SLP (which upper bounds the preprocessing in the static case) is potentially logarithmic in the data, but generally depends on the compressibility of the documents (in the worst case, it is even linear in the data). In contrast to that, we can guarantee that the dependency on the data of our updates is logarithmic regardless of the actual compression achieved by the SLP. In particular, any such update performed by complex document editing adds documents whose length may be exponentially larger than the time needed for performing such an update. Our approach hinges on balancing properties of SLPs, and for our updates it is vital to manipulate the SLP that represents the database in such a way that these balancing properties are maintained.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Query Evaluation over SLP-Represented Document Databases with Complex Document Editing\",\"authors\":\"Markus L. Schmid, Nicole Schweikardt\",\"doi\":\"10.1145/3517804.3524158\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is known that the query result of a regular spanner over a single document D can be enumerated after O(|D|) preprocessing and with constant delay in data complexity (Florenzano et al., ACM TODS 2020, Amarilli et al., ACM TODS 2021). It has been shown (Schmid and Schweikardt, PODS'21) that if the document is represented by a straight-line program (SLP) S, then enumeration is possible with a delay of O(log |D|), but with preprocessing that is linear in |S| (which, in the best case, is logarithmic in |D|). Hence, this compressed setting allows for spanner evaluation in sub-linear time, i.e., with logarithmic upper bounds for preprocessing and delay, if the document is highly-compressible. In this work, we extend these results to the dynamic setting. We consider a document database DDB = D1, D2, ..., Dm that is represented by an SLP SDDB, and that supports regular spanners M1, M2, ..., Mk (meaning that we have data structures at our disposal that allow O(log |Di|)-delay enumeration of the result of spanner Mj on document Di). Then we can perform an update by manipulating the existing documents of DDB by a sequence of text-editing operations commonly found in text-editors (like copy and paste, deleting, or copying factors, concatenating documents etc.), and add the thus constructed document to the database. Such an operation is called complex document editing and is given by an expression φ in a suitable algebra. Moreover, after this operation, the document database still supports all the regular spanners M1, ..., Mk. The total time required for such an update is O(k |φ| log d), where d is the maximum length of any intermediate document constructed in the complex document editing described by φ. We stress the fact that the size |SDDB| of the SLP (which upper bounds the preprocessing in the static case) is potentially logarithmic in the data, but generally depends on the compressibility of the documents (in the worst case, it is even linear in the data). In contrast to that, we can guarantee that the dependency on the data of our updates is logarithmic regardless of the actual compression achieved by the SLP. In particular, any such update performed by complex document editing adds documents whose length may be exponentially larger than the time needed for performing such an update. Our approach hinges on balancing properties of SLPs, and for our updates it is vital to manipulate the SLP that represents the database in such a way that these balancing properties are maintained.\",\"PeriodicalId\":230606,\"journal\":{\"name\":\"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3517804.3524158\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3517804.3524158","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Query Evaluation over SLP-Represented Document Databases with Complex Document Editing
It is known that the query result of a regular spanner over a single document D can be enumerated after O(|D|) preprocessing and with constant delay in data complexity (Florenzano et al., ACM TODS 2020, Amarilli et al., ACM TODS 2021). It has been shown (Schmid and Schweikardt, PODS'21) that if the document is represented by a straight-line program (SLP) S, then enumeration is possible with a delay of O(log |D|), but with preprocessing that is linear in |S| (which, in the best case, is logarithmic in |D|). Hence, this compressed setting allows for spanner evaluation in sub-linear time, i.e., with logarithmic upper bounds for preprocessing and delay, if the document is highly-compressible. In this work, we extend these results to the dynamic setting. We consider a document database DDB = D1, D2, ..., Dm that is represented by an SLP SDDB, and that supports regular spanners M1, M2, ..., Mk (meaning that we have data structures at our disposal that allow O(log |Di|)-delay enumeration of the result of spanner Mj on document Di). Then we can perform an update by manipulating the existing documents of DDB by a sequence of text-editing operations commonly found in text-editors (like copy and paste, deleting, or copying factors, concatenating documents etc.), and add the thus constructed document to the database. Such an operation is called complex document editing and is given by an expression φ in a suitable algebra. Moreover, after this operation, the document database still supports all the regular spanners M1, ..., Mk. The total time required for such an update is O(k |φ| log d), where d is the maximum length of any intermediate document constructed in the complex document editing described by φ. We stress the fact that the size |SDDB| of the SLP (which upper bounds the preprocessing in the static case) is potentially logarithmic in the data, but generally depends on the compressibility of the documents (in the worst case, it is even linear in the data). In contrast to that, we can guarantee that the dependency on the data of our updates is logarithmic regardless of the actual compression achieved by the SLP. In particular, any such update performed by complex document editing adds documents whose length may be exponentially larger than the time needed for performing such an update. Our approach hinges on balancing properties of SLPs, and for our updates it is vital to manipulate the SLP that represents the database in such a way that these balancing properties are maintained.