A Spark-Based Platform to Extract Phenological Information from Satellite Images

Viktor Bakayov, R. Goncalves, R. Zurita-Milla, E. Izquierdo-Verdiguier
{"title":"A Spark-Based Platform to Extract Phenological Information from Satellite Images","authors":"Viktor Bakayov, R. Goncalves, R. Zurita-Milla, E. Izquierdo-Verdiguier","doi":"10.1109/eScience.2018.00095","DOIUrl":null,"url":null,"abstract":"Phenology is the study of periodic plant and animal life cycle events and how these are influenced by seasonal and inter-annual variations in weather and climate, as well as in other environmental factors. Time series of remote sensing (RS) images can be used to characterize land surface phenology at continental to global scales. For this, the RS images are typically transformed into various vegetation indices (VI) such as the normalized difference vegetation index (NDVI) or the enhanced vegetation index (EVI). These indices can then be used to extract various phenological metrics. In our previous work we used cloud computing to generate temperature-based phenological indices [1], [2], and to relate one phenological metric, namely the Start-of-Season (SOS), with those indices [3], [4]. Here we present an extension of our work where we use a Spark-based platform to efficiently extract phenological metrics from time series of NDVI and EVI. This platform allows obtaining and analyzing high spatial resolution metrics (in this case 1km) from 10-day composites. The platform uses the same architecture as in [3], i.e., it is organized into three layers: a storage layer, a processing layer, and JupyterHub services for user-interaction. It is designed to store the data in well-known file formats like GeoTiffs and Hierarchical Data Format (HDF). For the data analysis the user expresses the operations in Jupyter notebooks as Python, R, or Scala code (Fig. 1). Hence, with a browser and remote connection, the user can express a research question and/or collect insights from large data sets. All computations are pushed down to the computational platform, and results fetched back for data visualization. To extract the phenological metrics, we rely on TimeSat [5]. TimeSat is a software package that can be used to fit a function (e.g. double logistic) to time series of VIs. After that, it uses various approaches to extract vegetation seasonality metrics such as SOS. The programs numerical and graphical routines are coded in Matlab and Fortran. These routines are highly vectorized and efficient for use with large data sets. However, distributed processing is required to determine SOS at continental scales. Through an efficient partition of the data, and Spark’s scheduling policies, these single-core routines are scheduled for parallel execution over multiple machines. The study evaluates which VIs and fitting functions are most suitable for certain vegetation types by comparing the SOS metrics to volunteered phenological observations curated by the USA national phenological network [6]. Our preliminary results show there can be up to 20-30 days differences in the SOS depending on the fitting function, the VI and the approach used to extract the SOS metric. In the South, SOS is around mid-February or March whereas in mountainous regions and the North, the SOS can be as late as June-July. We are to further evaluate how our results compare to the ground volunteered observations. This work is then a first stepping stone towards being able to systematically analyze and map the impact of climate change on the seasonality of plants. Our tests show that the platform is scalable and can be extended to work with even higher resolution VIs, such as those that can be derived from Sentinel-2 images (10 m resolution). Because of this, our work opens the door to studies at continental to global scales, and to the use of high and very high spatial resolution data.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"42 1","pages":"354-355"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 14th International Conference on e-Science (e-Science)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2018.00095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Phenology is the study of periodic plant and animal life cycle events and how these are influenced by seasonal and inter-annual variations in weather and climate, as well as in other environmental factors. Time series of remote sensing (RS) images can be used to characterize land surface phenology at continental to global scales. For this, the RS images are typically transformed into various vegetation indices (VI) such as the normalized difference vegetation index (NDVI) or the enhanced vegetation index (EVI). These indices can then be used to extract various phenological metrics. In our previous work we used cloud computing to generate temperature-based phenological indices [1], [2], and to relate one phenological metric, namely the Start-of-Season (SOS), with those indices [3], [4]. Here we present an extension of our work where we use a Spark-based platform to efficiently extract phenological metrics from time series of NDVI and EVI. This platform allows obtaining and analyzing high spatial resolution metrics (in this case 1km) from 10-day composites. The platform uses the same architecture as in [3], i.e., it is organized into three layers: a storage layer, a processing layer, and JupyterHub services for user-interaction. It is designed to store the data in well-known file formats like GeoTiffs and Hierarchical Data Format (HDF). For the data analysis the user expresses the operations in Jupyter notebooks as Python, R, or Scala code (Fig. 1). Hence, with a browser and remote connection, the user can express a research question and/or collect insights from large data sets. All computations are pushed down to the computational platform, and results fetched back for data visualization. To extract the phenological metrics, we rely on TimeSat [5]. TimeSat is a software package that can be used to fit a function (e.g. double logistic) to time series of VIs. After that, it uses various approaches to extract vegetation seasonality metrics such as SOS. The programs numerical and graphical routines are coded in Matlab and Fortran. These routines are highly vectorized and efficient for use with large data sets. However, distributed processing is required to determine SOS at continental scales. Through an efficient partition of the data, and Spark’s scheduling policies, these single-core routines are scheduled for parallel execution over multiple machines. The study evaluates which VIs and fitting functions are most suitable for certain vegetation types by comparing the SOS metrics to volunteered phenological observations curated by the USA national phenological network [6]. Our preliminary results show there can be up to 20-30 days differences in the SOS depending on the fitting function, the VI and the approach used to extract the SOS metric. In the South, SOS is around mid-February or March whereas in mountainous regions and the North, the SOS can be as late as June-July. We are to further evaluate how our results compare to the ground volunteered observations. This work is then a first stepping stone towards being able to systematically analyze and map the impact of climate change on the seasonality of plants. Our tests show that the platform is scalable and can be extended to work with even higher resolution VIs, such as those that can be derived from Sentinel-2 images (10 m resolution). Because of this, our work opens the door to studies at continental to global scales, and to the use of high and very high spatial resolution data.
基于spark的卫星图像物候信息提取平台
物候学是研究周期性植物和动物生命周期事件,以及这些事件如何受到季节和年际天气和气候变化以及其他环境因素的影响。时间序列遥感影像可用于表征大陆到全球尺度的陆地表面物候特征。为此,通常将RS图像转换为各种植被指数(VI),如归一化植被指数(NDVI)或增强植被指数(EVI)。这些指数可以用来提取各种物候指标。在我们之前的工作中,我们使用云计算来生成基于温度的物候指数[1],[2],并将一个物候指标,即季节开始(SOS)与这些指数[3],[4]联系起来。在这里,我们展示了我们工作的扩展,我们使用基于spark的平台从NDVI和EVI的时间序列中有效地提取物候指标。该平台可以从10天的复合材料中获取和分析高空间分辨率指标(在这种情况下为1公里)。该平台使用与[3]相同的架构,即它被组织为三层:存储层、处理层和用于用户交互的JupyterHub服务。它被设计成以众所周知的文件格式存储数据,如geotiff和分层数据格式(HDF)。对于数据分析,用户将Jupyter笔记本中的操作表达为Python, R或Scala代码(图1)。因此,通过浏览器和远程连接,用户可以表达研究问题和/或从大数据集中收集见解。所有的计算都下推到计算平台,并将结果提取出来用于数据可视化。为了提取物候指标,我们依赖于TimeSat[5]。TimeSat是一个软件包,可以用来拟合一个函数(如双逻辑)的时间序列的VIs。之后,它使用各种方法提取植被季节性指标,如SOS。程序的数值例程和图形例程分别用Matlab和Fortran编写。这些例程是高度矢量化的,对于大型数据集的使用效率很高。然而,需要分布式处理来确定大陆尺度上的SOS。通过有效的数据分区和Spark的调度策略,这些单核例程被安排在多台机器上并行执行。该研究通过比较SOS指标与美国国家物候网络[6]组织的志愿物候观测,评估了哪些VIs和拟合函数最适合某些植被类型。我们的初步结果表明,根据拟合函数,VI和用于提取SOS度量的方法,SOS可能存在高达20-30天的差异。在南方,SOS大约在2月中旬或3月,而在山区和北方,SOS可能晚至6月至7月。我们将进一步评估我们的结果与地面自愿观测结果的比较。这项工作是能够系统地分析和绘制气候变化对植物季节性影响的第一块踏脚石。我们的测试表明,该平台是可扩展的,可以扩展到更高分辨率的VIs,例如那些可以从Sentinel-2图像(10米分辨率)中获得的VIs。正因为如此,我们的工作为大陆到全球范围的研究打开了大门,并为使用高和非常高的空间分辨率数据打开了大门。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信