{"title":"GloBiMapsAI: An AI-Enhanced Probabilistic Data Structure for Global Raster Datasets","authors":"M. Werner","doi":"10.1145/3453184","DOIUrl":null,"url":null,"abstract":"In the last decade, more and more spatial data has been acquired on a global scale due to satellite missions, social media, and coordinated governmental activities. This observational data suffers from huge storage footprints and makes global analysis challenging. Therefore, many information products have been designed in which observations are turned into global maps showing features such as land cover or land use, often with only a few discrete values and sparse spatial coverage like only within cities. Traditional coding of such data as a raster image becomes challenging due to the sizes of the datasets and spatially non-local access patterns, for example, when labeling social media streams. This article proposes GloBiMap, a randomized data structure, based on Bloom filters, for modeling low-cardinality sparse raster images of excessive sizes in a configurable amount of memory with pure random access operations avoiding costly intermediate decompression. In addition, the data structure is designed to correct the inevitable errors of the randomized layer in order to have a fully exact representation. We show the feasibility of the approach on several real-world datasets including the Global Urban Footprint in which each pixel denotes whether a particular location contains a building at a resolution of roughly 10m globally as well as on a global Twitter sample of more than 220 million precisely geolocated tweets. In addition, we propose the integration of a denoiser engine based on artificial intelligence in order to reduce the amount of error correction information for extremely compressive GloBiMaps.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":" ","pages":"1 - 24"},"PeriodicalIF":1.2000,"publicationDate":"2021-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3453184","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Spatial Algorithms and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 1
Abstract
In the last decade, more and more spatial data has been acquired on a global scale due to satellite missions, social media, and coordinated governmental activities. This observational data suffers from huge storage footprints and makes global analysis challenging. Therefore, many information products have been designed in which observations are turned into global maps showing features such as land cover or land use, often with only a few discrete values and sparse spatial coverage like only within cities. Traditional coding of such data as a raster image becomes challenging due to the sizes of the datasets and spatially non-local access patterns, for example, when labeling social media streams. This article proposes GloBiMap, a randomized data structure, based on Bloom filters, for modeling low-cardinality sparse raster images of excessive sizes in a configurable amount of memory with pure random access operations avoiding costly intermediate decompression. In addition, the data structure is designed to correct the inevitable errors of the randomized layer in order to have a fully exact representation. We show the feasibility of the approach on several real-world datasets including the Global Urban Footprint in which each pixel denotes whether a particular location contains a building at a resolution of roughly 10m globally as well as on a global Twitter sample of more than 220 million precisely geolocated tweets. In addition, we propose the integration of a denoiser engine based on artificial intelligence in order to reduce the amount of error correction information for extremely compressive GloBiMaps.
期刊介绍:
ACM Transactions on Spatial Algorithms and Systems (TSAS) is a scholarly journal that publishes the highest quality papers on all aspects of spatial algorithms and systems and closely related disciplines. It has a multi-disciplinary perspective in that it spans a large number of areas where spatial data is manipulated or visualized (regardless of how it is specified - i.e., geometrically or textually) such as geography, geographic information systems (GIS), geospatial and spatiotemporal databases, spatial and metric indexing, location-based services, web-based spatial applications, geographic information retrieval (GIR), spatial reasoning and mining, security and privacy, as well as the related visual computing areas of computer graphics, computer vision, geometric modeling, and visualization where the spatial, geospatial, and spatiotemporal data is central.