基于天文数据热相关性的分层存储算法研究

X. Ye, Hailong Zhang, Jie Wang, Ya-Zhou Zhang, Xu Du, Han Wu
{"title":"基于天文数据热相关性的分层存储算法研究","authors":"X. Ye, Hailong Zhang, Jie Wang, Ya-Zhou Zhang, Xu Du, Han Wu","doi":"10.3389/fspas.2024.1371249","DOIUrl":null,"url":null,"abstract":"With the surge in astronomical data volume, modern astronomical research faces significant challenges in data storage, processing, and access. The I/O bottleneck issue in astronomical data processing is particularly prominent, limiting the efficiency of data processing. To address this issue, this paper proposes a tiered storage algorithm based on the access characteristics of astronomical data. The C4.5 decision tree algorithm is employed as the foundation to implement an astronomical data access correlation algorithm. Additionally, a data copy migration strategy is designed based on tiered storage technology to achieve efficient data access. Preprocessing tests were conducted on 418GB NSRT (Nanshan Radio Telescope) formaldehyde spectral line data, showcasing that tiered storage can potentially reduce data processing time by up to 38.15%. Similarly, utilizing 802.2 GB data from FAST (Five-hundred-meter Aperture Spherical radio Telescope) observations for pulsar search data processing tests, the tiered storage approach demonstrated a maximum reduction of 29.00% in data processing time. In concurrent testing of data processing workflows, the proposed astronomical data heat correlation algorithm in this paper achieved an average reduction of 17.78% in data processing time compared to centralized storage. Furthermore, in comparison to traditional heat algorithms, it reduced data processing time by 5.15%. The effectiveness of the proposed algorithm is positively correlated with the associativity between the algorithm and the processed data. The tiered storage algorithm based on the characteristics of astronomical data proposed in this paper is poised to provide algorithmic references for large-scale data processing in the field of astronomy in the future.","PeriodicalId":507437,"journal":{"name":"Frontiers in Astronomy and Space Sciences","volume":"2 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Study on tiered storage algorithm based on heat correlation of astronomical data\",\"authors\":\"X. Ye, Hailong Zhang, Jie Wang, Ya-Zhou Zhang, Xu Du, Han Wu\",\"doi\":\"10.3389/fspas.2024.1371249\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the surge in astronomical data volume, modern astronomical research faces significant challenges in data storage, processing, and access. The I/O bottleneck issue in astronomical data processing is particularly prominent, limiting the efficiency of data processing. To address this issue, this paper proposes a tiered storage algorithm based on the access characteristics of astronomical data. The C4.5 decision tree algorithm is employed as the foundation to implement an astronomical data access correlation algorithm. Additionally, a data copy migration strategy is designed based on tiered storage technology to achieve efficient data access. Preprocessing tests were conducted on 418GB NSRT (Nanshan Radio Telescope) formaldehyde spectral line data, showcasing that tiered storage can potentially reduce data processing time by up to 38.15%. Similarly, utilizing 802.2 GB data from FAST (Five-hundred-meter Aperture Spherical radio Telescope) observations for pulsar search data processing tests, the tiered storage approach demonstrated a maximum reduction of 29.00% in data processing time. In concurrent testing of data processing workflows, the proposed astronomical data heat correlation algorithm in this paper achieved an average reduction of 17.78% in data processing time compared to centralized storage. Furthermore, in comparison to traditional heat algorithms, it reduced data processing time by 5.15%. The effectiveness of the proposed algorithm is positively correlated with the associativity between the algorithm and the processed data. The tiered storage algorithm based on the characteristics of astronomical data proposed in this paper is poised to provide algorithmic references for large-scale data processing in the field of astronomy in the future.\",\"PeriodicalId\":507437,\"journal\":{\"name\":\"Frontiers in Astronomy and Space Sciences\",\"volume\":\"2 6\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Astronomy and Space Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fspas.2024.1371249\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Astronomy and Space Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fspas.2024.1371249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着天文数据量的激增,现代天文研究在数据存储、处理和访问方面面临着巨大挑战。其中,天文数据处理中的 I/O 瓶颈问题尤为突出,限制了数据处理的效率。针对这一问题,本文提出了一种基于天文数据访问特性的分层存储算法。以 C4.5 决策树算法为基础,实现了天文数据访问关联算法。此外,还设计了基于分层存储技术的数据拷贝迁移策略,以实现高效的数据访问。对 418GB NSRT(南山射电望远镜)甲醛谱线数据进行了预处理测试,结果表明分层存储可将数据处理时间减少 38.15%。同样,利用来自 FAST(五百米孔径球面射电望远镜)观测的 802.2GB 数据进行脉冲星搜索数据处理测试时,分层存储方法显示数据处理时间最多可缩短 29.00%。在数据处理工作流程的并行测试中,与集中存储相比,本文提出的天文数据热相关算法平均减少了 17.78% 的数据处理时间。此外,与传统的热算法相比,它还减少了 5.15%的数据处理时间。所提算法的有效性与算法和处理数据之间的关联性呈正相关。本文提出的基于天文数据特点的分层存储算法,有望为今后天文学领域的大规模数据处理提供算法参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Study on tiered storage algorithm based on heat correlation of astronomical data
With the surge in astronomical data volume, modern astronomical research faces significant challenges in data storage, processing, and access. The I/O bottleneck issue in astronomical data processing is particularly prominent, limiting the efficiency of data processing. To address this issue, this paper proposes a tiered storage algorithm based on the access characteristics of astronomical data. The C4.5 decision tree algorithm is employed as the foundation to implement an astronomical data access correlation algorithm. Additionally, a data copy migration strategy is designed based on tiered storage technology to achieve efficient data access. Preprocessing tests were conducted on 418GB NSRT (Nanshan Radio Telescope) formaldehyde spectral line data, showcasing that tiered storage can potentially reduce data processing time by up to 38.15%. Similarly, utilizing 802.2 GB data from FAST (Five-hundred-meter Aperture Spherical radio Telescope) observations for pulsar search data processing tests, the tiered storage approach demonstrated a maximum reduction of 29.00% in data processing time. In concurrent testing of data processing workflows, the proposed astronomical data heat correlation algorithm in this paper achieved an average reduction of 17.78% in data processing time compared to centralized storage. Furthermore, in comparison to traditional heat algorithms, it reduced data processing time by 5.15%. The effectiveness of the proposed algorithm is positively correlated with the associativity between the algorithm and the processed data. The tiered storage algorithm based on the characteristics of astronomical data proposed in this paper is poised to provide algorithmic references for large-scale data processing in the field of astronomy in the future.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信