The Raptor Join Operator for Processing Big Raster + Vector Data

Samriddhi Singla, A. Eldawy, Tina Diao, Ayan Mukhopadhyay, E. Scudiero
{"title":"The Raptor Join Operator for Processing Big Raster + Vector Data","authors":"Samriddhi Singla, A. Eldawy, Tina Diao, Ayan Mukhopadhyay, E. Scudiero","doi":"10.1145/3474717.3483971","DOIUrl":null,"url":null,"abstract":"Pre-processing spatial data for machine learning applications often includes combining different datasets into a form usable by the machine learning algorithms. Spatial data is generally available in two representations, raster and vector. The best data science and machine learning applications need to combine multiple datasets of both representations which is a data and compute intensive problem. This paper proposes a formal raster-vector join operator, Raptor Join, that can bridge the gap between raster and vector data. It is modeled as a relational join operator in Spark that can be easily combined with other operators, while also offering the advantage of in-situ processing. To implement the Raptor join operator efficiently, we propose a novel Flash index that has a low memory requirement and can process the entire operation with one data scan. We run an extensive experimental evaluation on large scale satellite data with up-to a trillion pixels, and big vector data with up-to hundreds of millions of segments and billions of points, and show that the proposed method can scale to big data with up-to three orders of magnitude performance gain over baselines.","PeriodicalId":340759,"journal":{"name":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474717.3483971","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Pre-processing spatial data for machine learning applications often includes combining different datasets into a form usable by the machine learning algorithms. Spatial data is generally available in two representations, raster and vector. The best data science and machine learning applications need to combine multiple datasets of both representations which is a data and compute intensive problem. This paper proposes a formal raster-vector join operator, Raptor Join, that can bridge the gap between raster and vector data. It is modeled as a relational join operator in Spark that can be easily combined with other operators, while also offering the advantage of in-situ processing. To implement the Raptor join operator efficiently, we propose a novel Flash index that has a low memory requirement and can process the entire operation with one data scan. We run an extensive experimental evaluation on large scale satellite data with up-to a trillion pixels, and big vector data with up-to hundreds of millions of segments and billions of points, and show that the proposed method can scale to big data with up-to three orders of magnitude performance gain over baselines.
用于处理大栅格+矢量数据的猛禽连接算子
用于机器学习应用程序的空间数据预处理通常包括将不同的数据集组合成机器学习算法可用的形式。空间数据通常有栅格和矢量两种表示形式。最好的数据科学和机器学习应用需要结合两种表示的多个数据集,这是一个数据和计算密集型问题。本文提出了一种正式的栅格-矢量连接算子Raptor join,它可以弥合栅格数据和矢量数据之间的差距。它在Spark中被建模为一个关系连接运算符,可以很容易地与其他运算符组合,同时还提供了原位处理的优势。为了有效地实现Raptor连接运算符,我们提出了一种新的Flash索引,它具有低内存需求,并且可以通过一次数据扫描处理整个操作。我们对具有高达一万亿像素的大规模卫星数据和具有数亿个片段和数十亿个点的大矢量数据进行了广泛的实验评估,并表明所提出的方法可以扩展到大数据,性能比基线提高了三个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信