{"title":"按需 JSON:解析文档的更好方法?","authors":"John Keiser, Daniel Lemire","doi":"10.1002/spe.3313","DOIUrl":null,"url":null,"abstract":"JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure—sometimes called a Document Object Model or DOM. We designed and implemented a novel JSON parsing interface—called On-Demand—that appears to the programmer like a conventional DOM-based approach. However, the underlying implementation is a pointer iterating through the content, only materializing the results (objects, arrays, strings, numbers) lazily. On recent commodity processors, an implementation of our approach provides superior performance in multiple benchmarks. To ensure reproducibility, our work is freely available as open source software. Several systems use On Demand: for example, Apache Doris, the Node.js JavaScript runtime, Milvus, and Velox.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On-demand JSON: A better way to parse documents?\",\"authors\":\"John Keiser, Daniel Lemire\",\"doi\":\"10.1002/spe.3313\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure—sometimes called a Document Object Model or DOM. We designed and implemented a novel JSON parsing interface—called On-Demand—that appears to the programmer like a conventional DOM-based approach. However, the underlying implementation is a pointer iterating through the content, only materializing the results (objects, arrays, strings, numbers) lazily. On recent commodity processors, an implementation of our approach provides superior performance in multiple benchmarks. To ensure reproducibility, our work is freely available as open source software. Several systems use On Demand: for example, Apache Doris, the Node.js JavaScript runtime, Milvus, and Velox.\",\"PeriodicalId\":21899,\"journal\":{\"name\":\"Software: Practice and Experience\",\"volume\":\"47 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software: Practice and Experience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/spe.3313\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
JSON 是互联网上一种流行的数据交换标准。接收 JSON 文档可能会成为性能瓶颈。一种流行的解析策略是将输入文本转换为基于树的数据结构--有时称为文档对象模型或 DOM。我们设计并实现了一种新颖的 JSON 解析界面(称为 On-Demand),它在程序员看来就像传统的基于 DOM 的方法。然而,其底层实现是指针在内容中迭代,只是将结果(对象、数组、字符串、数字)懒散地具体化。在最新的商品处理器上,我们的方法在多个基准测试中都取得了优异的性能。为确保可重现性,我们的工作作为开源软件免费提供。多个系统使用了 On Demand,例如 Apache Doris、Node.js JavaScript 运行时、Milvus 和 Velox。
JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure—sometimes called a Document Object Model or DOM. We designed and implemented a novel JSON parsing interface—called On-Demand—that appears to the programmer like a conventional DOM-based approach. However, the underlying implementation is a pointer iterating through the content, only materializing the results (objects, arrays, strings, numbers) lazily. On recent commodity processors, an implementation of our approach provides superior performance in multiple benchmarks. To ensure reproducibility, our work is freely available as open source software. Several systems use On Demand: for example, Apache Doris, the Node.js JavaScript runtime, Milvus, and Velox.