Morbig: a static parser for POSIX shell

Yann Régis-Gianas, Nicolas Jeannerod, R. Treinen
{"title":"Morbig: a static parser for POSIX shell","authors":"Yann Régis-Gianas, Nicolas Jeannerod, R. Treinen","doi":"10.1145/3276604.3276615","DOIUrl":null,"url":null,"abstract":"The POSIX shell language defies conventional wisdom of compiler construction on several levels: The shell language was not designed for static parsing, but with an intertwining of syntactic analysis and execution by expansion in mind. Token recognition cannot be specified by regular expressions, lexical analysis depends on the parsing context and the evaluation context, and the shell grammar given in the specification is ambiguous. Besides, the unorthodox design choices of the shell language fit badly in the usual specification languages used to describe other programming languages. This makes the standard usage of LEX and YACC as a pipeline inadequate for the implementation of a parser for POSIX shell. The existing implementations of shell parsers are complex and use low-level character-level parsing code which is difficult to relate to the POSIX specification. We find it hard to trust such parsers, especially when using them for writing automatic verification tools for shell scripts. This paper offers an overview of the technical difficulties related to the syntactic analysis of the POSIX shell language. It also describes how we have resolved these difficulties using advanced parsing techniques (namely speculative parsing, parser state introspection, context-dependent lexical analysis and longest-prefix parsing) while keeping the implementation at a sufficiently high level of abstraction so that experts can check that the POSIX standard is respected. The resulting tool, called MORBIG, is an open-source static parser for a well-defined and realistic subset of the POSIX shell language. Its implementation crucially relies on the purity and incrementality of LR(1) parsers generated by MENHIR, a parser generator for OCaml.","PeriodicalId":117525,"journal":{"name":"Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3276604.3276615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The POSIX shell language defies conventional wisdom of compiler construction on several levels: The shell language was not designed for static parsing, but with an intertwining of syntactic analysis and execution by expansion in mind. Token recognition cannot be specified by regular expressions, lexical analysis depends on the parsing context and the evaluation context, and the shell grammar given in the specification is ambiguous. Besides, the unorthodox design choices of the shell language fit badly in the usual specification languages used to describe other programming languages. This makes the standard usage of LEX and YACC as a pipeline inadequate for the implementation of a parser for POSIX shell. The existing implementations of shell parsers are complex and use low-level character-level parsing code which is difficult to relate to the POSIX specification. We find it hard to trust such parsers, especially when using them for writing automatic verification tools for shell scripts. This paper offers an overview of the technical difficulties related to the syntactic analysis of the POSIX shell language. It also describes how we have resolved these difficulties using advanced parsing techniques (namely speculative parsing, parser state introspection, context-dependent lexical analysis and longest-prefix parsing) while keeping the implementation at a sufficiently high level of abstraction so that experts can check that the POSIX standard is respected. The resulting tool, called MORBIG, is an open-source static parser for a well-defined and realistic subset of the POSIX shell language. Its implementation crucially relies on the purity and incrementality of LR(1) parsers generated by MENHIR, a parser generator for OCaml.
一个用于POSIX shell的静态解析器
POSIX shell语言在几个层次上违背了编译器构造的传统智慧:shell语言不是为静态解析而设计的,而是通过扩展将语法分析和执行交织在一起。令牌识别不能通过正则表达式指定,词法分析依赖于解析上下文和求值上下文,规范中给出的shell语法是不明确的。此外,shell语言的非正统设计选择非常不适合用于描述其他编程语言的通常规范语言。这使得LEX和YACC作为管道的标准用法不足以实现POSIX shell的解析器。shell解析器的现有实现非常复杂,并且使用低级的字符级解析代码,这很难与POSIX规范相关联。我们发现很难信任这样的解析器,特别是当使用它们为shell脚本编写自动验证工具时。本文概述了与POSIX shell语言的语法分析相关的技术难点。它还描述了我们如何使用高级解析技术(即推测解析、解析器状态自省、与上下文相关的词法分析和最长前缀解析)解决这些困难,同时保持实现在足够高的抽象级别,以便专家可以检查POSIX标准是否得到遵守。由此产生的工具称为MORBIG,它是一个开源静态解析器,用于POSIX shell语言的一个定义良好且实际的子集。它的实现主要依赖于MENHIR (OCaml的解析器生成器)生成的LR(1)解析器的纯度和递增性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信