Matthew Fenwick, Gerard Weatherby, Heidi Jc Ellis, Michael R Gryk
{"title":"解析器组合子:为核磁共振数据生成解析器的实际应用。","authors":"Matthew Fenwick, Gerard Weatherby, Heidi Jc Ellis, Michael R Gryk","doi":"10.1109/ITNG.2013.39","DOIUrl":null,"url":null,"abstract":"<p><p>Nuclear Magnetic Resonance (NMR) spectroscopy is a technique for acquiring protein data at atomic resolution and determining the three-dimensional structure of large protein molecules. A typical structure determination process results in the deposition of a large data sets to the BMRB (Bio-Magnetic Resonance Data Bank). This data is stored and shared in a file format called NMR-Star. This format is syntactically and semantically complex making it challenging to parse. Nevertheless, parsing these files is crucial to applying the vast amounts of biological information stored in NMR-Star files, allowing researchers to harness the results of previous studies to direct and validate future work. One powerful approach for parsing files is to apply a Backus-Naur Form (BNF) grammar, which is a high-level model of a file format. Translation of the grammatical model to an executable parser may be automatically accomplished. This paper will show how we applied a model BNF grammar of the NMR-Star format to create a free, open-source parser, using a method that originated in the functional programming world known as \"parser combinators\". This paper demonstrates the effectiveness of a principled approach to file specification and parsing. This paper also builds upon our previous work [1], in that 1) it applies concepts from Functional Programming (which is relevant even though the implementation language, Java, is more mainstream than Functional Programming), and 2) all work and accomplishments from this project will be made available under standard open source licenses to provide the community with the opportunity to learn from our techniques and methods.</p>","PeriodicalId":89615,"journal":{"name":"Proceedings of the ... International Conference on Information Technology: New Generations. International Conference on Information Technology: New Generations","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ITNG.2013.39","citationCount":"2","resultStr":"{\"title\":\"Parser Combinators: a Practical Application for Generating Parsers for NMR Data.\",\"authors\":\"Matthew Fenwick, Gerard Weatherby, Heidi Jc Ellis, Michael R Gryk\",\"doi\":\"10.1109/ITNG.2013.39\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Nuclear Magnetic Resonance (NMR) spectroscopy is a technique for acquiring protein data at atomic resolution and determining the three-dimensional structure of large protein molecules. A typical structure determination process results in the deposition of a large data sets to the BMRB (Bio-Magnetic Resonance Data Bank). This data is stored and shared in a file format called NMR-Star. This format is syntactically and semantically complex making it challenging to parse. Nevertheless, parsing these files is crucial to applying the vast amounts of biological information stored in NMR-Star files, allowing researchers to harness the results of previous studies to direct and validate future work. One powerful approach for parsing files is to apply a Backus-Naur Form (BNF) grammar, which is a high-level model of a file format. Translation of the grammatical model to an executable parser may be automatically accomplished. This paper will show how we applied a model BNF grammar of the NMR-Star format to create a free, open-source parser, using a method that originated in the functional programming world known as \\\"parser combinators\\\". This paper demonstrates the effectiveness of a principled approach to file specification and parsing. This paper also builds upon our previous work [1], in that 1) it applies concepts from Functional Programming (which is relevant even though the implementation language, Java, is more mainstream than Functional Programming), and 2) all work and accomplishments from this project will be made available under standard open source licenses to provide the community with the opportunity to learn from our techniques and methods.</p>\",\"PeriodicalId\":89615,\"journal\":{\"name\":\"Proceedings of the ... International Conference on Information Technology: New Generations. International Conference on Information Technology: New Generations\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/ITNG.2013.39\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... International Conference on Information Technology: New Generations. International Conference on Information Technology: New Generations\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITNG.2013.39\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Information Technology: New Generations. International Conference on Information Technology: New Generations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITNG.2013.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
核磁共振波谱是一种以原子分辨率获取蛋白质数据和确定大分子蛋白质三维结构的技术。一个典型的结构确定过程导致大量数据集沉积到BMRB(生物磁共振数据库)。这些数据以一种称为NMR-Star的文件格式存储和共享。这种格式在语法和语义上都很复杂,很难解析。然而,解析这些文件对于应用存储在NMR-Star文件中的大量生物信息至关重要,使研究人员能够利用以前的研究结果来指导和验证未来的工作。解析文件的一种强大方法是应用Backus-Naur Form (BNF)语法,这是文件格式的高级模型。语法模型到可执行解析器的转换可以自动完成。本文将展示我们如何应用NMR-Star格式的模型BNF语法来创建一个免费的开源解析器,使用起源于函数式编程世界的一种称为“解析器组合子”的方法。本文演示了一种有原则的文件规范和解析方法的有效性。本文还建立在我们之前的工作[1]的基础上,因为1)它应用了函数式编程的概念(尽管实现语言Java比函数式编程更主流),2)这个项目的所有工作和成果将在标准的开源许可下提供给社区,让他们有机会学习我们的技术和方法。
Parser Combinators: a Practical Application for Generating Parsers for NMR Data.
Nuclear Magnetic Resonance (NMR) spectroscopy is a technique for acquiring protein data at atomic resolution and determining the three-dimensional structure of large protein molecules. A typical structure determination process results in the deposition of a large data sets to the BMRB (Bio-Magnetic Resonance Data Bank). This data is stored and shared in a file format called NMR-Star. This format is syntactically and semantically complex making it challenging to parse. Nevertheless, parsing these files is crucial to applying the vast amounts of biological information stored in NMR-Star files, allowing researchers to harness the results of previous studies to direct and validate future work. One powerful approach for parsing files is to apply a Backus-Naur Form (BNF) grammar, which is a high-level model of a file format. Translation of the grammatical model to an executable parser may be automatically accomplished. This paper will show how we applied a model BNF grammar of the NMR-Star format to create a free, open-source parser, using a method that originated in the functional programming world known as "parser combinators". This paper demonstrates the effectiveness of a principled approach to file specification and parsing. This paper also builds upon our previous work [1], in that 1) it applies concepts from Functional Programming (which is relevant even though the implementation language, Java, is more mainstream than Functional Programming), and 2) all work and accomplishments from this project will be made available under standard open source licenses to provide the community with the opportunity to learn from our techniques and methods.