{"title":"用于科学数据存储和分析的高性能数据格式","authors":"Gagik Gavalian","doi":"10.1016/j.cpc.2025.109732","DOIUrl":null,"url":null,"abstract":"<div><div>In this article, we present the High-Performance Output (HiPO) data format developed at Jefferson Laboratory for storing and analyzing data from Nuclear Physics experiments. The format was designed to efficiently store large amounts of experimental data, utilizing modern fast compression algorithms. The purpose of this development was to provide organized data in the output, facilitating access to relevant information within the large data files. The HiPO data format has features that are suited for storing raw detector data, reconstruction data, and the final physics analysis data efficiently, eliminating the need to do data conversions through the lifecycle of experimental data. The HiPO data format is implemented in C++ and JAVA, and provides bindings to FORTRAN, Python, and Julia, providing users with the choice of data analysis frameworks to use. In this paper, we will present the general design and functionalities of the HiPO library and compare the performance of the library with more established data formats used in data analysis in High Energy and Nuclear Physics (such as ROOT <span><span>[3]</span></span> and Parquete).</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"315 ","pages":"Article 109732"},"PeriodicalIF":7.2000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-performance data format for scientific data storage and analysis\",\"authors\":\"Gagik Gavalian\",\"doi\":\"10.1016/j.cpc.2025.109732\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this article, we present the High-Performance Output (HiPO) data format developed at Jefferson Laboratory for storing and analyzing data from Nuclear Physics experiments. The format was designed to efficiently store large amounts of experimental data, utilizing modern fast compression algorithms. The purpose of this development was to provide organized data in the output, facilitating access to relevant information within the large data files. The HiPO data format has features that are suited for storing raw detector data, reconstruction data, and the final physics analysis data efficiently, eliminating the need to do data conversions through the lifecycle of experimental data. The HiPO data format is implemented in C++ and JAVA, and provides bindings to FORTRAN, Python, and Julia, providing users with the choice of data analysis frameworks to use. In this paper, we will present the general design and functionalities of the HiPO library and compare the performance of the library with more established data formats used in data analysis in High Energy and Nuclear Physics (such as ROOT <span><span>[3]</span></span> and Parquete).</div></div>\",\"PeriodicalId\":285,\"journal\":{\"name\":\"Computer Physics Communications\",\"volume\":\"315 \",\"pages\":\"Article 109732\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Physics Communications\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010465525002346\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465525002346","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
High-performance data format for scientific data storage and analysis
In this article, we present the High-Performance Output (HiPO) data format developed at Jefferson Laboratory for storing and analyzing data from Nuclear Physics experiments. The format was designed to efficiently store large amounts of experimental data, utilizing modern fast compression algorithms. The purpose of this development was to provide organized data in the output, facilitating access to relevant information within the large data files. The HiPO data format has features that are suited for storing raw detector data, reconstruction data, and the final physics analysis data efficiently, eliminating the need to do data conversions through the lifecycle of experimental data. The HiPO data format is implemented in C++ and JAVA, and provides bindings to FORTRAN, Python, and Julia, providing users with the choice of data analysis frameworks to use. In this paper, we will present the general design and functionalities of the HiPO library and compare the performance of the library with more established data formats used in data analysis in High Energy and Nuclear Physics (such as ROOT [3] and Parquete).
期刊介绍:
The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper.
Computer Programs in Physics (CPiP)
These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged.
Computational Physics Papers (CP)
These are research papers in, but are not limited to, the following themes across computational physics and related disciplines.
mathematical and numerical methods and algorithms;
computational models including those associated with the design, control and analysis of experiments; and
algebraic computation.
Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.