Bogdan Simion, Daniel N. Ilha, Angela Demke Brown, Ryan Johnson
{"title":"The price of generality in spatial indexing","authors":"Bogdan Simion, Daniel N. Ilha, Angela Demke Brown, Ryan Johnson","doi":"10.1145/2534921.2534923","DOIUrl":"https://doi.org/10.1145/2534921.2534923","url":null,"abstract":"Efficient indexing can significantly speed up the processing of large volumes of spatial data in many BigData applications. Many new emerging spatial applications (e.g., biomedical imaging, genome analysis, etc.) have varying indexing requirements, thus, a unified indexing infrastructure for implementing new indexing schemes without requiring knowledge of database internals is beneficial. However, designing a generic indexing framework is a challenging task. We study the issues with general indexing schemes, such as the GiST (used in PostGIS) and expose the tradeoff between generality and performance, showing that generality can be severely detrimental to performance if the abstractions are not carefully designed. Our experiments indicate that the GiST framework, as implemented in PostgreSQL/PostGIS, performs 4.5-6x slower for filtering records through the index, compared to a custom R-tree implementation. We also isolate the GiST-specific overhead by implementing the framework outside the DBMS, showing that the GiST-based R-tree is up to 2x slower than the raw R-tree algorithm that it uses internally. We conclude that although a generic framework for a wide range of spatial BigData application domains is desirable, implementers of new frameworks need to be careful in designing the abstractions to avoid paying a hefty performance penalty.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131455327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Baumann, A. Dumitru, Vlad Merticariu, D. Misev, M. Rusu
{"title":"Breaking the big data barrier by enhancing on-board sensor flexibility","authors":"P. Baumann, A. Dumitru, Vlad Merticariu, D. Misev, M. Rusu","doi":"10.1145/2534921.2534926","DOIUrl":"https://doi.org/10.1145/2534921.2534926","url":null,"abstract":"Modern sensors, such as hyperspectral cameras, deliver massive amounts of data. On board of satellites, the high volume is paired with low bandwidth and part-time availability, during overpasses. This leads to well-known availability problems and bottlenecks in today's remote sensing.\u0000 We address this challenge by enhancing the on-board system with flexible filtering and processing capabilities based on the Array Analytics engine, rasdaman. Users then can exact request, which can lead to substantially decreased data traffic. Our project has been accepted for a CubeSat mission for which rasdaman now has been prepared. We present the project setup and core extensions done to rasdaman to this end.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124013481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaoyan Chen, Huajun Chen, Jeff Z. Pan, Ming Wu, Ningyu Zhang, Guozhou Zheng
{"title":"When big data meets big smog: a big spatio-temporal data framework for China severe smog analysis","authors":"Jiaoyan Chen, Huajun Chen, Jeff Z. Pan, Ming Wu, Ningyu Zhang, Guozhou Zheng","doi":"10.1145/2534921.2534924","DOIUrl":"https://doi.org/10.1145/2534921.2534924","url":null,"abstract":"Recently, the appearing disaster of severe smog has been attacking many cities in China such as the capital Beijing. The chief culprit of China smog, namely PM2.5, is affected by various factors including air pollutants, weather, climate, geographical location, urbanization, etc. To analyze the factors, we collect about 35,000,000 air quality records and about 30,000,000 weather records from the sensors in 77 China's cities in 2013. Moreover, two big data sets named Geoname and DBPedia are also combined for the data of climate, geographical location and urbanization. To deal with big spatio-temporal data for big smog analysis, we propose a MapReduce-based framework named BigSmog. It mainly conducts parallel correlation analysis of the factors and scalable training of artificial neural networks for spatio-temporal approximation of the concentration of PM2.5. In the experiments, BigSmog displays high scalability for big smog analysis with big spatio-temporal data. The analysis result shows that the air pollutants influence the short-term concentration of PM2.5 more than the weather and the factors of geographical location and climate rather than urbanization play a major role in determining a city's long-term pollution level of PM2.5. Moreover, the trained ANNs can accurately approximate the concentration of PM2.5.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133280037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel spatial query processing on GPUs using R-trees","authors":"Simin You, Jianting Zhang, L. Gruenwald","doi":"10.1145/2534921.2534949","DOIUrl":"https://doi.org/10.1145/2534921.2534949","url":null,"abstract":"R-Trees are popular spatial indexing techniques that have been widely adopted in many geospatial applications. As commodity GPUs (Graphics Processing Units) are increasingly becoming available on personal workstations and cluster computers, there are considerable research interests in applying the massive data parallel GPGPU (General Purpose computing on GPUs) technologies to index and query large-scale geospatial data on GPUs using R-Trees. In this study, we aim at evaluating the potentials of accelerating both R-Tree bulk loading and spatial window query processing on GPUs using R-Trees. In addition to designing an efficient data layout schema for R-Trees on GPUs, we have implemented several parallel spatial window query processing techniques on GPUs using both dynamically generated R-Trees constructed on CPUs and bulk loaded R-Trees constructed on GPUs. Extensive experiments using both synthetic and real-world datasets have shown that our GPU based parallel query processing techniques using R-Trees can achieve about 10X speedups on average over 8-core CPU parallel implementations by effectively utilizing large numbers of processors and high memory bandwidth on GPUs.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121605310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Prasad, S. Shekhar, Michael McDermott, Xun Zhou, Michael R. Evans, S. Puri
{"title":"GPGPU-accelerated interesting interval discovery and other computations on GeoSpatial datasets: a summary of results","authors":"S. Prasad, S. Shekhar, Michael McDermott, Xun Zhou, Michael R. Evans, S. Puri","doi":"10.1145/2534921.2535837","DOIUrl":"https://doi.org/10.1145/2534921.2535837","url":null,"abstract":"It is imperative that for scalable solutions of GIS computations the modern hybrid architecture comprising a CPU-GPU pair is exploited fully. The existing parallel algorithms and data structures port reasonably well to multi-core CPUs, but poorly to GPGPUs because of latter's atypical fine-grained, single-instruction multiple-thread (SIMT) architecture, extreme memory hierarchy and coalesced access requirements, and delicate CPU-GPU coordination. Recently, our parallelization of the state-of-art interesting sequence discovery algorithms calculates one-dimensional interesting intervals over an image representing the normalized difference vegetation indices of Africa within 31 ms on an nVidia 480GTX. To our knowledge, this paper reports the first parallelization of these algorithms. This allowed us to process 612 images representing biweekly data from July 1981 through Dec 2006 within 22 seconds. We were also able to pipe the output to a display in almost real-time, which would interest climate scientists. We have also undertaken parallelization of two key tree-based data structures, namely R-tree and heap, and have employed parallel R-tree in polygon overlay system. These data structure parallelization are hard because of the underlying tree topology and the fine-grained computation leading to frequent access to such data structures severely stifling parallel efficiency.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114579851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A cluster-based morphological filter for geospatial data analysis","authors":"Zheng Cui, Keqi Zhang, Chengcui Zhang, Shu‐Ching Chen","doi":"10.1145/2534921.2534922","DOIUrl":"https://doi.org/10.1145/2534921.2534922","url":null,"abstract":"LIDAR (Light Detection and Ranging) is a widely used technology to measure terrain properties and topographic mapping nowadays. Many filtering methods have been developed to process the geospatial data generated by LIDAR to generate bare earth digital terrain models. Among these methods, mathematical morphological filtering is a very effective and efficient method to separate ground and non-ground objects from LIDAR data. This method can achieve ideal results in the flat terrain, while it is not working very well in the undulating and complex terrain with large non-ground objects. The reason is that it would remove ground terrain objects along with filtering large size non-ground objects when using a large filtering window size. Especially in the mountainous terrain, it would cause the hill cut-off problem, which is a common problem for morphological filters. In this paper, a cluster-based morphological filter is proposed to improve the progressive morphological filter and make it work better on more undulating and complex terrain types. The filtering results demonstrate that the proposed method is able to effectively preserve terrain ground objects and remove large non-ground objects.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"550 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129337924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object based image classification: state of the art and computational challenges","authors":"Ranga Raju Vatsavai","doi":"10.1145/2534921.2534927","DOIUrl":"https://doi.org/10.1145/2534921.2534927","url":null,"abstract":"As the spatial resolution of satellite remote sensing imagery is advancing towards sub meter, the predominantly pixel based (or single instance) classification methods needs be redesigned to take advantage of the spatial and structural patterns found in the very high resolution imagery. In this work, we look at the advantages of object based image analysis methods through the newer multiple instance learning learning schemes. We analyze these methods in the context of big geospatial data and allude readers to some of the outstanding computational challenges.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130132397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"P2EST: parallelization philosophies for evaluating spatio-temporal queries","authors":"Xiling Sun, Anan Yaagoub, Goce Trajcevski, P. Scheuermann, Hao Chen, Abhinav Kachhwaha","doi":"10.1145/2534921.2534929","DOIUrl":"https://doi.org/10.1145/2534921.2534929","url":null,"abstract":"This work considers the impact of different contexts when attempting to exploit parallelization approaches for processing continuous spatio-temporal queries. More specifically, we are interested in various trade-off aspects that may arise due to differences of the computing environments like, for example, multicore vs. cloud. Algorithmic solutions for parallel processing of spatio-temporal queries cater to splitting the load among units - be it based on the data or the query (or both) - relying to a bigger or lesser degree on a certain set of features of a given environment. We postulate that incorporating the service-features should be coupled with the algorithms/heuristics for processing particular queries, in addition to the volume of the data. We present the current version of the implementation of our P2EST system and analyze the execution of different heuristics for parallel processing of spatio-temporal range queries.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127681534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ayhan, J. Pesce, P. Comitz, G. Gerberick, S. Bliesner
{"title":"Predictive analytics with surveillance big data","authors":"S. Ayhan, J. Pesce, P. Comitz, G. Gerberick, S. Bliesner","doi":"10.1145/2447481.2447491","DOIUrl":"https://doi.org/10.1145/2447481.2447491","url":null,"abstract":"In this paper, we describe a novel analytics system that enables query processing and predictive analytics over streams of aviation data. As part of an Internal Research and Development project, Boeing Research and Technology (BR&T) Advanced Air Traffic Management (AATM) built a system that makes predictions based upon descriptive patterns of archived aviation data. Boeing AATM has been receiving live Aircraft Situation Display to Industry (ASDI) data and archiving it for over two years. At the present time, there is not an easy mechanism to perform analytics on the data. The incoming ASDI data is large, compressed, and requires correlation with other flight data before it can be analyzed.\u0000 The service exposes this data once it has been uncompressed, correlated, and stored in a data warehouse for further analysis using a variety of descriptive, predictive, and possibly prescriptive analytics tools. The service is being built partially in response to requests from Boeing Commercial Aviation (BCA) for analysis of capacity and flow in the US National Airspace System (NAS). The service utilizes a custom tool for correlating the raw ASDI feed, IBM Warehouse with DB2 for data management, WebSphere Message Broker for real-time message brokering, SPSS Modeler for statistical analysis, and Cognos BI for front-end business intelligence (BI) visualization. This paper describes a scalable service architecture, implementation and the value it adds to the aviation domain.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125733589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Elastic and effective spatio-temporal query processing scheme on Hadoop","authors":"Yunqin Zhong, Xiaomin Zhu, Jinyun Fang","doi":"10.1145/2447481.2447486","DOIUrl":"https://doi.org/10.1145/2447481.2447486","url":null,"abstract":"Geospatial applications have become prevalent in both scientific research and industry. Spatio-Temporal query processing is a fundamental issue for driving geospatial applications. However, the state-of-the-art spatio-temporal query processing methods are facing significant challenges as the data expand and concurrent users increase. In this paper we present a novel spatio-temporal querying scheme to provide efficient query processing over big geospatial data. The scheme improves query efficiency from three facets. Firstly, taking geographic proximity and storage locality into consideration, we propose a geospatial data organization approach to achieve high aggregate I/O throughput, and design a distributed indexing framework for efficient pruning of the search space. Furthermore, we design an indexing plus MapReduce query processing architecture to improve data retrieval efficiency and query computation efficiency. In addition, we design distributed caching model to accelerate the access response of hotspot spatial objects. We evaluate the effectiveness of our scheme with comprehensive experiments using real datasets and application scenarios.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"701 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122970965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}