IF 2.3 3区 生物学 Q2 MULTIDISCIPLINARY SCIENCES
PeerJ Pub Date : 2024-12-04 eCollection Date: 2024-01-01 DOI:10.7717/peerj.18585
Davide Consoli, Leandro Parente, Rolf Simoes, Murat Şahin, Xuemeng Tian, Martijn Witjes, Lindsey Sloat, Tomislav Hengl
{"title":"A computational framework for processing time-series of earth observation data based on discrete convolution: global-scale historical Landsat cloud-free aggregates at 30 m spatial resolution.","authors":"Davide Consoli, Leandro Parente, Rolf Simoes, Murat Şahin, Xuemeng Tian, Martijn Witjes, Lindsey Sloat, Tomislav Hengl","doi":"10.7717/peerj.18585","DOIUrl":null,"url":null,"abstract":"<p><p>Processing large collections of earth observation (EO) time-series, often petabyte-sized, such as NASA's Landsat and ESA's Sentinel missions, can be computationally prohibitive and costly. Despite their name, even the Analysis Ready Data (ARD) versions of such collections can rarely be used as direct input for modeling because of cloud presence and/or prohibitive storage size. Existing solutions for readily using these data are not openly available, are poor in performance, or lack flexibility. Addressing this issue, we developed TSIRF (Time-Series Iteration-free Reconstruction Framework), a computational framework that can be used to apply diverse time-series processing tasks, such as temporal aggregation and time-series reconstruction by simply adjusting the convolution kernel. As the first large-scale application, TSIRF was employed to process the entire Global Land Analysis and Discovery (GLAD) ARD Landsat archive, producing a cloud-free bi-monthly aggregated product. This process, covering seven Landsat bands globally from 1997 to 2022, with more than two trillion pixels and for each one a time-series of 156 samples in the aggregated product, required approximately 28 hours of computation using 1248 Intel<sup>®</sup> Xeon<sup>®</sup> Gold 6248R CPUs. The quality of the result was assessed using a benchmark dataset derived from the aggregated product and comparing different imputation strategies. The resulting reconstructed images can be used as input for machine learning models or to map biophysical indices. To further limit the storage size the produced data was saved as 8-bit Cloud-Optimized GeoTIFFs (COG). With the hosting of about 20 TB per band/index for an entire 30 m resolution bi-monthly historical time-series distributed as open data, the product enables seamless, fast, and affordable access to the Landsat archive for environmental monitoring and analysis applications.</p>","PeriodicalId":19799,"journal":{"name":"PeerJ","volume":"12 ","pages":"e18585"},"PeriodicalIF":2.3000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11624844/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7717/peerj.18585","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

美国国家航空航天局(NASA)的陆地卫星(Landsat)和欧空局(ESA)的哨兵(Sentinel)任务等大型对地观测(EO)时间序列集合的处理通常都是 PB 级的,其计算量和成本都很高。尽管名为 "分析就绪数据"(ARD),但由于云的存在和/或过大的存储容量,此类数据集很少能直接用作建模输入。可随时使用这些数据的现有解决方案并不公开、性能不佳或缺乏灵活性。针对这一问题,我们开发了 TSIRF(无迭代时间序列重构框架),这是一个计算框架,只需调整卷积核,就能用于各种时间序列处理任务,如时间聚合和时间序列重构。作为首次大规模应用,TSIRF 被用来处理整个全球土地分析与发现(GLAD)ARD Landsat 档案,生成无云的双月聚合产品。该处理过程涵盖 1997 年至 2022 年全球七个 Landsat 波段,像素超过两万亿个,每个波段的时间序列在汇总产品中包含 156 个样本,需要使用 1248 个 Intel® Xeon® Gold 6248R CPU 进行约 28 小时的计算。计算结果的质量是通过使用从聚合产品中提取的基准数据集来评估的,并对不同的估算策略进行了比较。重建后的图像可用作机器学习模型的输入或绘制生物物理指数图。为了进一步限制存储大小,生成的数据被保存为 8 位云优化 GeoTIFFs (COG)。作为开放数据发布的整个 30 米分辨率双月历史时间序列的每个波段/索引的托管容量约为 20 TB,该产品可实现无缝、快速、经济地访问 Landsat 档案,用于环境监测和分析应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A computational framework for processing time-series of earth observation data based on discrete convolution: global-scale historical Landsat cloud-free aggregates at 30 m spatial resolution.

Processing large collections of earth observation (EO) time-series, often petabyte-sized, such as NASA's Landsat and ESA's Sentinel missions, can be computationally prohibitive and costly. Despite their name, even the Analysis Ready Data (ARD) versions of such collections can rarely be used as direct input for modeling because of cloud presence and/or prohibitive storage size. Existing solutions for readily using these data are not openly available, are poor in performance, or lack flexibility. Addressing this issue, we developed TSIRF (Time-Series Iteration-free Reconstruction Framework), a computational framework that can be used to apply diverse time-series processing tasks, such as temporal aggregation and time-series reconstruction by simply adjusting the convolution kernel. As the first large-scale application, TSIRF was employed to process the entire Global Land Analysis and Discovery (GLAD) ARD Landsat archive, producing a cloud-free bi-monthly aggregated product. This process, covering seven Landsat bands globally from 1997 to 2022, with more than two trillion pixels and for each one a time-series of 156 samples in the aggregated product, required approximately 28 hours of computation using 1248 Intel® Xeon® Gold 6248R CPUs. The quality of the result was assessed using a benchmark dataset derived from the aggregated product and comparing different imputation strategies. The resulting reconstructed images can be used as input for machine learning models or to map biophysical indices. To further limit the storage size the produced data was saved as 8-bit Cloud-Optimized GeoTIFFs (COG). With the hosting of about 20 TB per band/index for an entire 30 m resolution bi-monthly historical time-series distributed as open data, the product enables seamless, fast, and affordable access to the Landsat archive for environmental monitoring and analysis applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
PeerJ
PeerJ MULTIDISCIPLINARY SCIENCES-
CiteScore
4.70
自引率
3.70%
发文量
1665
审稿时长
10 weeks
期刊介绍: PeerJ is an open access peer-reviewed scientific journal covering research in the biological and medical sciences. At PeerJ, authors take out a lifetime publication plan (for as little as $99) which allows them to publish articles in the journal for free, forever. PeerJ has 5 Nobel Prize Winners on the Board; they have won several industry and media awards; and they are widely recognized as being one of the most interesting recent developments in academic publishing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信