Computational Model for Parsing Expression Grammars

arXiv - CS - Formal Languages and Automata Theory Pub Date : 2024-06-21 DOI:arxiv-2406.14911

Alexander Rubtsov, Nikita Chudinov

{"title":"Computational Model for Parsing Expression Grammars","authors":"Alexander Rubtsov, Nikita Chudinov","doi":"arxiv-2406.14911","DOIUrl":null,"url":null,"abstract":"We present a computational model for Parsing Expression Grammars (PEGs). The\npredecessor of PEGs top-down parsing languages (TDPLs) were discovered by A.\nBirman and J. Ullman in the 1960-s, B. Ford showed in 2004 that both formalisms\nrecognize the same class named Parsing Expression Languages (PELs). A. Birman\nand J. Ullman established such important properties like TDPLs generate any\nDCFL and some non-context-free languages like $a^nb^nc^n$, a linear-time\nparsing algorithm was constructed as well. But since this parsing algorithm was\nimpractical in the 60-s TDPLs were abandoned and then upgraded by B. Ford to\nPEGs, so the parsing algorithm was improved (from the practical point of view)\nas well. Now PEGs are actively used in compilers (eg., Python replaced\nLL(1)-parser with a PEG one) so as for text processing as well. In this paper,\nwe present a computational model for PEG, obtain structural properties of PELs,\nnamely proof that PELs are closed over left concatenation with Boolean closure\nof regular closure of DCFLs, and present an extension of the PELs class based\non the extension of our computational model. Our model is an upgrade of\ndeterministic pushdown automata (DPDA) such that during the pop of a symbol it\nis allowed to return the head to the position of the push of the symbol. We\nprovide a linear-time simulation algorithm for the 2-way version of this model,\nwhich is similar to the S. Cook famous linear-time simulation algorithm of\n2-way DPDA.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"178 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Formal Languages and Automata Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.14911","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We present a computational model for Parsing Expression Grammars (PEGs). The predecessor of PEGs top-down parsing languages (TDPLs) were discovered by A. Birman and J. Ullman in the 1960-s, B. Ford showed in 2004 that both formalisms recognize the same class named Parsing Expression Languages (PELs). A. Birman and J. Ullman established such important properties like TDPLs generate any DCFL and some non-context-free languages like $a^nb^nc^n$, a linear-time parsing algorithm was constructed as well. But since this parsing algorithm was impractical in the 60-s TDPLs were abandoned and then upgraded by B. Ford to PEGs, so the parsing algorithm was improved (from the practical point of view) as well. Now PEGs are actively used in compilers (eg., Python replaced LL(1)-parser with a PEG one) so as for text processing as well. In this paper, we present a computational model for PEG, obtain structural properties of PELs, namely proof that PELs are closed over left concatenation with Boolean closure of regular closure of DCFLs, and present an extension of the PELs class based on the extension of our computational model. Our model is an upgrade of deterministic pushdown automata (DPDA) such that during the pop of a symbol it is allowed to return the head to the position of the push of the symbol. We provide a linear-time simulation algorithm for the 2-way version of this model, which is similar to the S. Cook famous linear-time simulation algorithm of 2-way DPDA.

查看原文本刊更多论文

解析表达式语法的计算模型

我们提出了一个解析表达式语法（PEG）的计算模型。PEG 的前身自上而下解析语言（TDPL）是由 A. Birman 和 J. Ullman 在 20 世纪 60 年代发现的，B. Ford 在 2004 年指出，这两种形式主义承认同一类语言，即解析表达式语言（PEL）。A. Birman 和 J. Ullman 建立了 TDPLs 生成任意 DCFL 和一些非无上下文语言（如 $a^nb^nc^n$）等重要属性，并构建了线性时间解析算法。但由于这种解析算法在 60 年代并不实用，所以 TDPLs 被放弃了，后来 B. Ford 将其升级为 PEGs，因此解析算法也得到了改进（从实用的角度来看）。现在，PEG 已被积极用于编译器中（例如，Python 用 PEG 替代了LL(1)-parser ），也被用于文本处理。在本文中，我们提出了 PEG 的计算模型，获得了 PELs 的结构特性，即证明了 PELs 在 DCFLs 的规则闭包与布尔闭包的左连接上是闭包的，并基于我们计算模型的扩展提出了 PELs 类的扩展。我们的模型是确定性下推自动机（DPDA）的升级版，在符号弹出的过程中，允许将头部返回到符号推入的位置。我们为这个模型的双向版本提供了一种线性时间仿真算法，它类似于库克（S. Cook）著名的双向 DPDA 线性时间仿真算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Formal Languages and Automata Theory

自引率

0.00%

发文量