A computational framework for processing time-series of earth observation data based on discrete convolution: global-scale historical Landsat cloud-free aggregates at 30 m spatial resolution.

IF 2.3 3区生物学 Q2 MULTIDISCIPLINARY SCIENCES

PeerJ Pub Date : 2024-12-04 eCollection Date: 2024-01-01 DOI:10.7717/peerj.18585

Davide Consoli, Leandro Parente, Rolf Simoes, Murat Şahin, Xuemeng Tian, Martijn Witjes, Lindsey Sloat, Tomislav Hengl

{"title":"A computational framework for processing time-series of earth observation data based on discrete convolution: global-scale historical Landsat cloud-free aggregates at 30 m spatial resolution.","authors":"Davide Consoli, Leandro Parente, Rolf Simoes, Murat Şahin, Xuemeng Tian, Martijn Witjes, Lindsey Sloat, Tomislav Hengl","doi":"10.7717/peerj.18585","DOIUrl":null,"url":null,"abstract":"Processing large collections of earth observation (EO) time-series, often petabyte-sized, such as NASA's Landsat and ESA's Sentinel missions, can be computationally prohibitive and costly. Despite their name, even the Analysis Ready Data (ARD) versions of such collections can rarely be used as direct input for modeling because of cloud presence and/or prohibitive storage size. Existing solutions for readily using these data are not openly available, are poor in performance, or lack flexibility. Addressing this issue, we developed TSIRF (Time-Series Iteration-free Reconstruction Framework), a computational framework that can be used to apply diverse time-series processing tasks, such as temporal aggregation and time-series reconstruction by simply adjusting the convolution kernel. As the first large-scale application, TSIRF was employed to process the entire Global Land Analysis and Discovery (GLAD) ARD Landsat archive, producing a cloud-free bi-monthly aggregated product. This process, covering seven Landsat bands globally from 1997 to 2022, with more than two trillion pixels and for each one a time-series of 156 samples in the aggregated product, required approximately 28 hours of computation using 1248 Intel® Xeon® Gold 6248R CPUs. The quality of the result was assessed using a benchmark dataset derived from the aggregated product and comparing different imputation strategies. The resulting reconstructed images can be used as input for machine learning models or to map biophysical indices. To further limit the storage size the produced data was saved as 8-bit Cloud-Optimized GeoTIFFs (COG). With the hosting of about 20 TB per band/index for an entire 30 m resolution bi-monthly historical time-series distributed as open data, the product enables seamless, fast, and affordable access to the Landsat archive for environmental monitoring and analysis applications.","PeriodicalId":19799,"journal":{"name":"PeerJ","volume":"12 ","pages":"e18585"},"PeriodicalIF":2.3000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11624844/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7717/peerj.18585","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Processing large collections of earth observation (EO) time-series, often petabyte-sized, such as NASA's Landsat and ESA's Sentinel missions, can be computationally prohibitive and costly. Despite their name, even the Analysis Ready Data (ARD) versions of such collections can rarely be used as direct input for modeling because of cloud presence and/or prohibitive storage size. Existing solutions for readily using these data are not openly available, are poor in performance, or lack flexibility. Addressing this issue, we developed TSIRF (Time-Series Iteration-free Reconstruction Framework), a computational framework that can be used to apply diverse time-series processing tasks, such as temporal aggregation and time-series reconstruction by simply adjusting the convolution kernel. As the first large-scale application, TSIRF was employed to process the entire Global Land Analysis and Discovery (GLAD) ARD Landsat archive, producing a cloud-free bi-monthly aggregated product. This process, covering seven Landsat bands globally from 1997 to 2022, with more than two trillion pixels and for each one a time-series of 156 samples in the aggregated product, required approximately 28 hours of computation using 1248 Intel^® Xeon^® Gold 6248R CPUs. The quality of the result was assessed using a benchmark dataset derived from the aggregated product and comparing different imputation strategies. The resulting reconstructed images can be used as input for machine learning models or to map biophysical indices. To further limit the storage size the produced data was saved as 8-bit Cloud-Optimized GeoTIFFs (COG). With the hosting of about 20 TB per band/index for an entire 30 m resolution bi-monthly historical time-series distributed as open data, the product enables seamless, fast, and affordable access to the Landsat archive for environmental monitoring and analysis applications.

查看原文本刊更多论文

美国国家航空航天局（NASA）的陆地卫星（Landsat）和欧空局（ESA）的哨兵（Sentinel）任务等大型对地观测（EO）时间序列集合的处理通常都是 PB 级的，其计算量和成本都很高。尽管名为 "分析就绪数据"（ARD），但由于云的存在和/或过大的存储容量，此类数据集很少能直接用作建模输入。可随时使用这些数据的现有解决方案并不公开、性能不佳或缺乏灵活性。针对这一问题，我们开发了 TSIRF（无迭代时间序列重构框架），这是一个计算框架，只需调整卷积核，就能用于各种时间序列处理任务，如时间聚合和时间序列重构。作为首次大规模应用，TSIRF 被用来处理整个全球土地分析与发现（GLAD）ARD Landsat 档案，生成无云的双月聚合产品。该处理过程涵盖 1997 年至 2022 年全球七个 Landsat 波段，像素超过两万亿个，每个波段的时间序列在汇总产品中包含 156 个样本，需要使用 1248 个 Intel® Xeon® Gold 6248R CPU 进行约 28 小时的计算。计算结果的质量是通过使用从聚合产品中提取的基准数据集来评估的，并对不同的估算策略进行了比较。重建后的图像可用作机器学习模型的输入或绘制生物物理指数图。为了进一步限制存储大小，生成的数据被保存为 8 位云优化 GeoTIFFs (COG)。作为开放数据发布的整个 30 米分辨率双月历史时间序列的每个波段/索引的托管容量约为 20 TB，该产品可实现无缝、快速、经济地访问 Landsat 档案，用于环境监测和分析应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PeerJ MULTIDISCIPLINARY SCIENCES-

CiteScore

4.70

自引率

3.70%

发文量

1665

审稿时长

10 weeks

期刊介绍： PeerJ is an open access peer-reviewed scientific journal covering research in the biological and medical sciences. At PeerJ, authors take out a lifetime publication plan (for as little as $99) which allows them to publish articles in the journal for free, forever. PeerJ has 5 Nobel Prize Winners on the Board; they have won several industry and media awards; and they are widely recognized as being one of the most interesting recent developments in academic publishing.