描述数据,获取科学数据就绪工具:作为开泰结构 YAML 目标的尴尬

Manasvi Goyal, Andrea Zonca, Amy Roberts, Jim Pivarski, Ianna Osborne
{"title":"描述数据,获取科学数据就绪工具:作为开泰结构 YAML 目标的尴尬","authors":"Manasvi Goyal, Andrea Zonca, Amy Roberts, Jim Pivarski, Ianna Osborne","doi":"arxiv-2407.14461","DOIUrl":null,"url":null,"abstract":"In some fields, scientific data formats differ across experiments due to\nspecialized hardware and data acquisition systems. Researchers need to develop,\ndocument, and maintain experiment-specific analysis software to interact with\nthese data formats. These software are often tightly coupled with a particular\ndata format. This proliferation of custom data formats has been a prominent\nchallenge for small to mid-scale experiments. The widespread adoption of ROOT\nhas largely mitigated this problem for the Large Hadron Collider experiments.\nHowever, many smaller experiments continue to use custom data formats to meet\nspecific research needs. Therefore, simplifying the process of accessing a\nunique data format for analysis holds immense value for scientific communities\nwithin HEP. We have added Awkward Arrays as a target language for Kaitai Struct\nfor this purpose. Researchers can describe their custom data format in the\nKaitai Struct YAML (KSY) language. The Kaitai Struct Compiler generates C++\ncode to fill the LayoutBuilder buffers using the KSY format. In a few steps,\nthe Kaitai Struct Awkward Runtime API can convert the generated C++ code into a\ncompiled Python module. Finally, the raw data can be passed to the module to\nproduce Awkward Arrays. This paper introduces the Awkward Target for the Kaitai\nStruct Compiler and the Kaitai Struct Awkward Runtime API. It also demonstrates\nthe conversion of a given KSY for a specific custom file format to Awkward\nArrays.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Describe Data to get Science-Data-Ready Tooling: Awkward as a Target for Kaitai Struct YAML\",\"authors\":\"Manasvi Goyal, Andrea Zonca, Amy Roberts, Jim Pivarski, Ianna Osborne\",\"doi\":\"arxiv-2407.14461\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In some fields, scientific data formats differ across experiments due to\\nspecialized hardware and data acquisition systems. Researchers need to develop,\\ndocument, and maintain experiment-specific analysis software to interact with\\nthese data formats. These software are often tightly coupled with a particular\\ndata format. This proliferation of custom data formats has been a prominent\\nchallenge for small to mid-scale experiments. The widespread adoption of ROOT\\nhas largely mitigated this problem for the Large Hadron Collider experiments.\\nHowever, many smaller experiments continue to use custom data formats to meet\\nspecific research needs. Therefore, simplifying the process of accessing a\\nunique data format for analysis holds immense value for scientific communities\\nwithin HEP. We have added Awkward Arrays as a target language for Kaitai Struct\\nfor this purpose. Researchers can describe their custom data format in the\\nKaitai Struct YAML (KSY) language. The Kaitai Struct Compiler generates C++\\ncode to fill the LayoutBuilder buffers using the KSY format. In a few steps,\\nthe Kaitai Struct Awkward Runtime API can convert the generated C++ code into a\\ncompiled Python module. Finally, the raw data can be passed to the module to\\nproduce Awkward Arrays. This paper introduces the Awkward Target for the Kaitai\\nStruct Compiler and the Kaitai Struct Awkward Runtime API. It also demonstrates\\nthe conversion of a given KSY for a specific custom file format to Awkward\\nArrays.\",\"PeriodicalId\":501197,\"journal\":{\"name\":\"arXiv - CS - Programming Languages\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Programming Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.14461\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.14461","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在某些领域,由于硬件和数据采集系统的特殊性,不同实验的科学数据格式各不相同。研究人员需要开发、记录和维护特定于实验的分析软件,以便与这些数据格式进行交互。这些软件通常与特定的数据格式紧密结合。定制数据格式的激增一直是中小型实验面临的一个突出挑战。ROOT 的广泛采用在很大程度上缓解了大型强子对撞机实验的这一问题。然而,许多小型实验仍在继续使用自定义数据格式,以满足特定的研究需求。因此,简化访问独特数据格式进行分析的过程对于 HEP 内的科学界具有巨大价值。为此,我们为 Kaitai Struct 增加了 Awkward Arrays 作为目标语言。研究人员可以用 Kaitai Struct YAML(KSY)语言描述他们的自定义数据格式。Kaitai Struct 编译器会生成 C++ 代码,使用 KSY 格式填充 LayoutBuilder 缓冲区。只需几步,Kaitai Struct Awkward Runtime API 就能将生成的 C++ 代码转换为编译后的 Python 模块。最后,原始数据可以传递给模块,生成 Awkward 数组。本文介绍了用于 KaitaiStruct 编译器的 Awkward Target 和 Kaitai Struct Awkward Runtime API。本文还演示了将特定自定义文件格式的给定 KSY 转换为 AwkwardArrays 的过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Describe Data to get Science-Data-Ready Tooling: Awkward as a Target for Kaitai Struct YAML
In some fields, scientific data formats differ across experiments due to specialized hardware and data acquisition systems. Researchers need to develop, document, and maintain experiment-specific analysis software to interact with these data formats. These software are often tightly coupled with a particular data format. This proliferation of custom data formats has been a prominent challenge for small to mid-scale experiments. The widespread adoption of ROOT has largely mitigated this problem for the Large Hadron Collider experiments. However, many smaller experiments continue to use custom data formats to meet specific research needs. Therefore, simplifying the process of accessing a unique data format for analysis holds immense value for scientific communities within HEP. We have added Awkward Arrays as a target language for Kaitai Struct for this purpose. Researchers can describe their custom data format in the Kaitai Struct YAML (KSY) language. The Kaitai Struct Compiler generates C++ code to fill the LayoutBuilder buffers using the KSY format. In a few steps, the Kaitai Struct Awkward Runtime API can convert the generated C++ code into a compiled Python module. Finally, the raw data can be passed to the module to produce Awkward Arrays. This paper introduces the Awkward Target for the Kaitai Struct Compiler and the Kaitai Struct Awkward Runtime API. It also demonstrates the conversion of a given KSY for a specific custom file format to Awkward Arrays.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信