Storage optimisation and distributed architecture for time series reconstruction of massive astronomical catalogues

IF 2.7 3区 物理与天体物理 Q2 ASTRONOMY & ASTROPHYSICS
Qing Zhao, Le Sun, Mengxiang Zhang, Chengkui Zhang, Chenzhou Cui, Dongwei Fan
{"title":"Storage optimisation and distributed architecture for time series reconstruction of massive astronomical catalogues","authors":"Qing Zhao,&nbsp;Le Sun,&nbsp;Mengxiang Zhang,&nbsp;Chengkui Zhang,&nbsp;Chenzhou Cui,&nbsp;Dongwei Fan","doi":"10.1007/s10686-023-09913-9","DOIUrl":null,"url":null,"abstract":"<div><p>Time series reconstruction of astronomical catalogues is an important part of data archiving and a basis for time-domain astronomical analysis in the era of time-domain astronomy. As the field of view and sampling frequency of various time-domain telescopes increase, the amount of data to be processed becomes larger and larger. How to optimize the spatial and temporal efficiency of this process with the aid of computer technology becomes a hot issue. To address the problem of spatial efficiency, in this paper, we propose a time series data compression algorithm based on the negative database and dynamic programming, and on this basis, we design a multi-level storage and access query architecture for hot data and non-hot data, which greatly compresses the storage space of data while ensuring the query efficiency. To address the issue of time efficiency, this paper proposes a spatio-temporal data partitioning and layout algorithm suitable for distributed architecture, whose nested round-robin strategy has a wide range of load balancing effects on different spatial locations, temporal locations, and different ranges of temporal data queries, which can effectively ensure the execution efficiency of the distributed system. Experimental results show that the proposed optimization algorithm can keep the system at a low load skewness level of about 4% and save about 83% of storage space.</p></div>","PeriodicalId":551,"journal":{"name":"Experimental Astronomy","volume":"56 2-3","pages":"821 - 845"},"PeriodicalIF":2.7000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Experimental Astronomy","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.1007/s10686-023-09913-9","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

Time series reconstruction of astronomical catalogues is an important part of data archiving and a basis for time-domain astronomical analysis in the era of time-domain astronomy. As the field of view and sampling frequency of various time-domain telescopes increase, the amount of data to be processed becomes larger and larger. How to optimize the spatial and temporal efficiency of this process with the aid of computer technology becomes a hot issue. To address the problem of spatial efficiency, in this paper, we propose a time series data compression algorithm based on the negative database and dynamic programming, and on this basis, we design a multi-level storage and access query architecture for hot data and non-hot data, which greatly compresses the storage space of data while ensuring the query efficiency. To address the issue of time efficiency, this paper proposes a spatio-temporal data partitioning and layout algorithm suitable for distributed architecture, whose nested round-robin strategy has a wide range of load balancing effects on different spatial locations, temporal locations, and different ranges of temporal data queries, which can effectively ensure the execution efficiency of the distributed system. Experimental results show that the proposed optimization algorithm can keep the system at a low load skewness level of about 4% and save about 83% of storage space.

Abstract Image

海量天文目录时间序列重建的存储优化和分布式架构
在时域天文学时代,天文目录的时间序列重建是数据存档的重要组成部分,也是时域天文分析的基础。随着各种时域望远镜视场和采样频率的提高,需要处理的数据量也越来越大。如何借助计算机技术优化这一处理过程的空间和时间效率成为一个热点问题。针对空间效率问题,本文提出了一种基于负数据库和动态编程的时间序列数据压缩算法,并在此基础上设计了热数据和非热数据的多级存储和访问查询架构,在保证查询效率的同时极大地压缩了数据的存储空间。针对时间效率问题,本文提出了一种适合分布式架构的时空数据分区与布局算法,其嵌套轮循策略对不同空间位置、不同时间位置、不同范围的时空数据查询具有大范围的负载均衡效果,能有效保证分布式系统的执行效率。实验结果表明,所提出的优化算法能使系统保持在约 4% 的低负载倾斜度水平,并节省约 83% 的存储空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Experimental Astronomy
Experimental Astronomy 地学天文-天文与天体物理
CiteScore
5.30
自引率
3.30%
发文量
57
审稿时长
6-12 weeks
期刊介绍: Many new instruments for observing astronomical objects at a variety of wavelengths have been and are continually being developed. Furthermore, a vast amount of effort is being put into the development of new techniques for data analysis in order to cope with great streams of data collected by these instruments. Experimental Astronomy acts as a medium for the publication of papers of contemporary scientific interest on astrophysical instrumentation and methods necessary for the conduct of astronomy at all wavelength fields. Experimental Astronomy publishes full-length articles, research letters and reviews on developments in detection techniques, instruments, and data analysis and image processing techniques. Occasional special issues are published, giving an in-depth presentation of the instrumentation and/or analysis connected with specific projects, such as satellite experiments or ground-based telescopes, or of specialized techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信