Complexity of Context-Free Grammars with Exceptions and the Inadequacy of Grammars as Models for XML and SGML

Markup Lang. Pub Date : 2000-12-01 DOI:10.1162/109966201753537222

Romeo Rizzi

{"title":"Complexity of Context-Free Grammars with Exceptions and the Inadequacy of Grammars as Models for XML and SGML","authors":"Romeo Rizzi","doi":"10.1162/109966201753537222","DOIUrl":null,"url":null,"abstract":"The Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML) allow authors to better transmit the semantics in their documents by explicitly specifying the relevant structures in a document or class of documents by means of document type definitions (DTDs. Several authors have proposed to regard DTDs as extended context-free grammars expressed in a notation similar to extended Backus--Naur form. In addition, the SGML standard allows the semantics of content models (the right-hand side of productions) to be modified by exceptions. Inclusion exceptions allow named elements to appear anywhere within the content of a content model, and exclusion exceptions preclude named elements from appearing in the content of a content model. Since XML does not allow exceptions, the problem of exception removal has received much interest recently. Motivated by this, Kilpelainen and Wood have proved that exceptions do not increase the expressive power of extended context-free grammars and that for each DTD with exceptions, we can obtain a structurally equivalent extended context-free grammar. Since their argument was based on an exponential simulation, they also conjectured that an exponential blow-up in the size of the grammar is a necessary devil when purging exceptions away. We prove their conjecture under the most realistic assumption that NP-complete problems do not admit non-uniform polynomial-time algorithms. Kilpelainen and Wood also asked whether the parsing problem for extended context-free grammars with exceptions admits efficient algorithmic solution. We show the NP-completeness of the very basic problem: given a string w and a context-free grammar G (not even extended) with exclusion exceptions (no inclusion exceptions needed), decide whether w belongs to the language generated by G. Our results and arguments point up the limitations of using extended context-free grammars as a model of SGML, especially when one is interested in understanding issues related to exceptions.","PeriodicalId":447112,"journal":{"name":"Markup Lang.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Markup Lang.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/109966201753537222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML) allow authors to better transmit the semantics in their documents by explicitly specifying the relevant structures in a document or class of documents by means of document type definitions (DTDs. Several authors have proposed to regard DTDs as extended context-free grammars expressed in a notation similar to extended Backus--Naur form. In addition, the SGML standard allows the semantics of content models (the right-hand side of productions) to be modified by exceptions. Inclusion exceptions allow named elements to appear anywhere within the content of a content model, and exclusion exceptions preclude named elements from appearing in the content of a content model. Since XML does not allow exceptions, the problem of exception removal has received much interest recently. Motivated by this, Kilpelainen and Wood have proved that exceptions do not increase the expressive power of extended context-free grammars and that for each DTD with exceptions, we can obtain a structurally equivalent extended context-free grammar. Since their argument was based on an exponential simulation, they also conjectured that an exponential blow-up in the size of the grammar is a necessary devil when purging exceptions away. We prove their conjecture under the most realistic assumption that NP-complete problems do not admit non-uniform polynomial-time algorithms. Kilpelainen and Wood also asked whether the parsing problem for extended context-free grammars with exceptions admits efficient algorithmic solution. We show the NP-completeness of the very basic problem: given a string w and a context-free grammar G (not even extended) with exclusion exceptions (no inclusion exceptions needed), decide whether w belongs to the language generated by G. Our results and arguments point up the limitations of using extended context-free grammars as a model of SGML, especially when one is interested in understanding issues related to exceptions.

查看原文本刊更多论文

具有例外的上下文无关语法的复杂性和语法作为XML和SGML模型的不足

标准通用标记语言(SGML)和可扩展标记语言(XML)允许作者通过文档类型定义(dtd)显式指定文档或文档类中的相关结构，从而更好地在其文档中传输语义。一些作者建议将dtd视为用类似于扩展的Backus- Naur形式的表示法表示的扩展的与上下文无关的语法。此外，SGML标准允许通过异常修改内容模型的语义(产品的右侧)。包含异常允许命名元素出现在内容模型内容中的任何地方，而排除异常则禁止命名元素出现在内容模型的内容中。由于XML不允许异常，异常移除问题最近受到了广泛关注。受此启发，Kilpelainen和Wood证明了异常不会增加扩展上下文无关语法的表达能力，并且对于每个带有异常的DTD，我们都可以获得结构等效的扩展上下文无关语法。由于他们的论点是基于指数模拟，他们还推测，在清除异常时，语法大小的指数膨胀是一个必要的魔鬼。我们在np完全问题不允许非一致多项式时间算法的最现实假设下证明了他们的猜想。Kilpelainen和Wood还问道，对于带有异常的扩展上下文无关语法的解析问题，是否存在有效的算法解决方案。我们展示了非常基本问题的np完备性:给定字符串w和具有排除异常(不需要包含异常)的上下文无关语法G(甚至没有扩展)，确定w是否属于G生成的语言。我们的结果和参数指出了使用扩展上下文无关语法作为SGML模型的局限性，特别是当人们对理解与异常相关的问题感兴趣时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Markup Lang.

自引率

0.00%

发文量