{"title":"On validation of XML streams using finite state machines","authors":"Cristiana Chitic, D. Rosu","doi":"10.1145/1017074.1017096","DOIUrl":null,"url":null,"abstract":"We study validation of streamed XML documents by means of finite state machines. Previous work has shown that validation is in principle possible by finite state automata, but the construction was prohibitively expensive, giving an exponential-size nondeterministic automaton. Instead, we want to find deterministic automata for validating streamed documents: for them, the complexity of validation is constant per tag. We show that for a reading window of size one and nonrecursive DTDs with one-unambiguous content (i.e. conforming to the current XML standard) there is an algorithm producing a deterministic automaton that validates documents with respect to that DTD. The size of the automaton is at most exponential and we give matching lower bounds. To capture the possible advantages offered by reading windows of size k, we introduce k-unambiguity as a generalization of one-unambiguity, and study the validation against DTDs with k-unambiguous content. We also consider recursive DTDs and give conditions under which they can be validated against by using one-counter automata.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1017074.1017096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 46
Abstract
We study validation of streamed XML documents by means of finite state machines. Previous work has shown that validation is in principle possible by finite state automata, but the construction was prohibitively expensive, giving an exponential-size nondeterministic automaton. Instead, we want to find deterministic automata for validating streamed documents: for them, the complexity of validation is constant per tag. We show that for a reading window of size one and nonrecursive DTDs with one-unambiguous content (i.e. conforming to the current XML standard) there is an algorithm producing a deterministic automaton that validates documents with respect to that DTD. The size of the automaton is at most exponential and we give matching lower bounds. To capture the possible advantages offered by reading windows of size k, we introduce k-unambiguity as a generalization of one-unambiguity, and study the validation against DTDs with k-unambiguous content. We also consider recursive DTDs and give conditions under which they can be validated against by using one-counter automata.