Puya Memarzia, Maria Patrou, M. Alam, S. Ray, V. Bhavsar, K. Kent
{"title":"分布式内存系统中时空工作负载的高效处理","authors":"Puya Memarzia, Maria Patrou, M. Alam, S. Ray, V. Bhavsar, K. Kent","doi":"10.1109/MDM.2019.00-66","DOIUrl":null,"url":null,"abstract":"Location-based services (LBS) are a widely adopted technology that produces large volumes of spatio-temporal data at high velocity. Spatial data is also being generated from many other geo-spatial applications. To address the challenge of data volume, a number of big spatial data management systems have emerged that are based on the MapReduce paradigm. Recent projects have developed spatial data systems using Spark's distributed in-memory architecture. These projects, which include GeoSpark, SpatialSpark, and LocationSpark, do not support the high update rates required by LBS applications. Alternatively, systems such as MD-HBase support data updates, but are hindered by the performance characteristics of HBase, which is a disk-oriented framework. We present DISTIL+, a distributed spatio-temporal data processing system designed for high velocity location data. Our system achieves high update throughput and low query latency by leveraging the APGAS (Asynchronous Partitioned Global Address Space) architecture to build a multi-level distributed in-memory index. We present extensive experimental evaluation of our system, comparing several indexing and data placement schemes, as well as competing systems. Our results show that DISTIL+ excels at supporting high throughput location updates, and low latency spatio-temporal range queries and kNN queries, while offering better performance than existing approaches.","PeriodicalId":241426,"journal":{"name":"2019 20th IEEE International Conference on Mobile Data Management (MDM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Toward Efficient Processing of Spatio-Temporal Workloads in a Distributed In-Memory System\",\"authors\":\"Puya Memarzia, Maria Patrou, M. Alam, S. Ray, V. Bhavsar, K. Kent\",\"doi\":\"10.1109/MDM.2019.00-66\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Location-based services (LBS) are a widely adopted technology that produces large volumes of spatio-temporal data at high velocity. Spatial data is also being generated from many other geo-spatial applications. To address the challenge of data volume, a number of big spatial data management systems have emerged that are based on the MapReduce paradigm. Recent projects have developed spatial data systems using Spark's distributed in-memory architecture. These projects, which include GeoSpark, SpatialSpark, and LocationSpark, do not support the high update rates required by LBS applications. Alternatively, systems such as MD-HBase support data updates, but are hindered by the performance characteristics of HBase, which is a disk-oriented framework. We present DISTIL+, a distributed spatio-temporal data processing system designed for high velocity location data. Our system achieves high update throughput and low query latency by leveraging the APGAS (Asynchronous Partitioned Global Address Space) architecture to build a multi-level distributed in-memory index. We present extensive experimental evaluation of our system, comparing several indexing and data placement schemes, as well as competing systems. Our results show that DISTIL+ excels at supporting high throughput location updates, and low latency spatio-temporal range queries and kNN queries, while offering better performance than existing approaches.\",\"PeriodicalId\":241426,\"journal\":{\"name\":\"2019 20th IEEE International Conference on Mobile Data Management (MDM)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 20th IEEE International Conference on Mobile Data Management (MDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MDM.2019.00-66\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th IEEE International Conference on Mobile Data Management (MDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MDM.2019.00-66","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Toward Efficient Processing of Spatio-Temporal Workloads in a Distributed In-Memory System
Location-based services (LBS) are a widely adopted technology that produces large volumes of spatio-temporal data at high velocity. Spatial data is also being generated from many other geo-spatial applications. To address the challenge of data volume, a number of big spatial data management systems have emerged that are based on the MapReduce paradigm. Recent projects have developed spatial data systems using Spark's distributed in-memory architecture. These projects, which include GeoSpark, SpatialSpark, and LocationSpark, do not support the high update rates required by LBS applications. Alternatively, systems such as MD-HBase support data updates, but are hindered by the performance characteristics of HBase, which is a disk-oriented framework. We present DISTIL+, a distributed spatio-temporal data processing system designed for high velocity location data. Our system achieves high update throughput and low query latency by leveraging the APGAS (Asynchronous Partitioned Global Address Space) architecture to build a multi-level distributed in-memory index. We present extensive experimental evaluation of our system, comparing several indexing and data placement schemes, as well as competing systems. Our results show that DISTIL+ excels at supporting high throughput location updates, and low latency spatio-temporal range queries and kNN queries, while offering better performance than existing approaches.