使用SpatialHadoop实现大规模空间数据验证

S. Migliorini, A. Belussi, Mauro Negri, G. Pelagatti
{"title":"使用SpatialHadoop实现大规模空间数据验证","authors":"S. Migliorini, A. Belussi, Mauro Negri, G. Pelagatti","doi":"10.1145/3006386.3006392","DOIUrl":null,"url":null,"abstract":"Spatial data usually encapsulate semantic characterization of features which carry out important meaning and relations among objects, such as the containment between the extension of a region and of its constituent parts. The GeoUML methodology allows one to bring the gap between the definition of spatial integrity constraints at conceptual level and the realization of validation procedures. In particular, it automatically generates SQL validation queries starting from a conceptual specification and using predefined SQL templates. These queries can be used to check data contained into spatial relational databases, such as PostGIS.\n However, the quality requirements and the amount of available data are considerably growing making unfeasible the execution of these validation procedures. The use of the map-reduce paradigm can be effectively applied in such context since the same test can be performed in parallel on different data chunks and then partial results can be combined together to obtain the final set of violating objects. Pigeon is a data-flow language defined on top of Spatial Hadoop which provides spatial data types and functions. The aim of this paper is to explore the possibility to extend the GeoUML methodology by automatically producing Pigeon validation procedures starting from a set of predefined Pigeon macros. These scripts can be used in a map-reduce environment in order to make feasible the validation of large datasets.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Towards massive spatial data validation with SpatialHadoop\",\"authors\":\"S. Migliorini, A. Belussi, Mauro Negri, G. Pelagatti\",\"doi\":\"10.1145/3006386.3006392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spatial data usually encapsulate semantic characterization of features which carry out important meaning and relations among objects, such as the containment between the extension of a region and of its constituent parts. The GeoUML methodology allows one to bring the gap between the definition of spatial integrity constraints at conceptual level and the realization of validation procedures. In particular, it automatically generates SQL validation queries starting from a conceptual specification and using predefined SQL templates. These queries can be used to check data contained into spatial relational databases, such as PostGIS.\\n However, the quality requirements and the amount of available data are considerably growing making unfeasible the execution of these validation procedures. The use of the map-reduce paradigm can be effectively applied in such context since the same test can be performed in parallel on different data chunks and then partial results can be combined together to obtain the final set of violating objects. Pigeon is a data-flow language defined on top of Spatial Hadoop which provides spatial data types and functions. The aim of this paper is to explore the possibility to extend the GeoUML methodology by automatically producing Pigeon validation procedures starting from a set of predefined Pigeon macros. These scripts can be used in a map-reduce environment in order to make feasible the validation of large datasets.\",\"PeriodicalId\":416086,\"journal\":{\"name\":\"International Workshop on Analytics for Big Geospatial Data\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on Analytics for Big Geospatial Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3006386.3006392\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Analytics for Big Geospatial Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3006386.3006392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

空间数据通常封装了对象之间具有重要意义和关系的特征的语义表征,例如区域的扩展与其组成部分之间的包容关系。GeoUML方法允许在概念级别定义空间完整性约束与验证过程的实现之间缩小差距。特别是,它从概念规范开始并使用预定义的SQL模板自动生成SQL验证查询。这些查询可用于检查包含在空间关系数据库(如PostGIS)中的数据。然而,质量要求和可用数据的数量正在显著增长,使得这些验证程序的执行变得不可行。map-reduce范式的使用可以有效地应用于这种情况,因为相同的测试可以在不同的数据块上并行执行,然后可以将部分结果组合在一起以获得最终的违反对象集。Pigeon是一种定义在Spatial Hadoop之上的数据流语言,它提供空间数据类型和函数。本文的目的是探索通过从一组预定义的Pigeon宏开始自动生成Pigeon验证过程来扩展GeoUML方法的可能性。这些脚本可以在map-reduce环境中使用,以使大型数据集的验证变得可行。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards massive spatial data validation with SpatialHadoop
Spatial data usually encapsulate semantic characterization of features which carry out important meaning and relations among objects, such as the containment between the extension of a region and of its constituent parts. The GeoUML methodology allows one to bring the gap between the definition of spatial integrity constraints at conceptual level and the realization of validation procedures. In particular, it automatically generates SQL validation queries starting from a conceptual specification and using predefined SQL templates. These queries can be used to check data contained into spatial relational databases, such as PostGIS. However, the quality requirements and the amount of available data are considerably growing making unfeasible the execution of these validation procedures. The use of the map-reduce paradigm can be effectively applied in such context since the same test can be performed in parallel on different data chunks and then partial results can be combined together to obtain the final set of violating objects. Pigeon is a data-flow language defined on top of Spatial Hadoop which provides spatial data types and functions. The aim of this paper is to explore the possibility to extend the GeoUML methodology by automatically producing Pigeon validation procedures starting from a set of predefined Pigeon macros. These scripts can be used in a map-reduce environment in order to make feasible the validation of large datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信