{"title":"融合:在异构观测数据集上使用分布式查询松弛的自适应时空数据集成","authors":"Saptashwa Mitra, S. Pallickara","doi":"10.1109/UCC.2018.00027","DOIUrl":null,"url":null,"abstract":"Combining data from disparate sources enhances the opportunity to explore different aspects of the phenomena under consideration. However, there are several challenges in doing so effectively that include, inter alia, the heterogeneity in data representation and format, collection patterns, and integration of foreign data attributes in a ready-to-use condition. In this study, we have designed a scalable query-oriented distributed data integration framework, Confluence, that also dynamically generates accurate interpolations for the targeted spatiotemporal scopes along with an estimate of the uncertainty involved with such estimation in case of spatiotemporal misalignment of datapoints. Confluence efficiently orchestrates computations to evaluate spatiotemporal query joins and facilitates distributed query evaluations with a dynamic relaxation of query constraints. Query evaluations are locality-aware and we leverage model-based dynamic parameter selection to provide accurate estimation for data points. We have included empirical benchmarks that profile our system in terms of accuracy, latency, and throughput at scale and also demonstrate its improvement in performance in a distributed cloud computing environment over GeoMesa, a Spark-based geospatial analytics framework.","PeriodicalId":288232,"journal":{"name":"2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Confluence: Adaptive Spatiotemporal Data Integration Using Distributed Query Relaxation over Heterogeneous Observational Datasets\",\"authors\":\"Saptashwa Mitra, S. Pallickara\",\"doi\":\"10.1109/UCC.2018.00027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Combining data from disparate sources enhances the opportunity to explore different aspects of the phenomena under consideration. However, there are several challenges in doing so effectively that include, inter alia, the heterogeneity in data representation and format, collection patterns, and integration of foreign data attributes in a ready-to-use condition. In this study, we have designed a scalable query-oriented distributed data integration framework, Confluence, that also dynamically generates accurate interpolations for the targeted spatiotemporal scopes along with an estimate of the uncertainty involved with such estimation in case of spatiotemporal misalignment of datapoints. Confluence efficiently orchestrates computations to evaluate spatiotemporal query joins and facilitates distributed query evaluations with a dynamic relaxation of query constraints. Query evaluations are locality-aware and we leverage model-based dynamic parameter selection to provide accurate estimation for data points. We have included empirical benchmarks that profile our system in terms of accuracy, latency, and throughput at scale and also demonstrate its improvement in performance in a distributed cloud computing environment over GeoMesa, a Spark-based geospatial analytics framework.\",\"PeriodicalId\":288232,\"journal\":{\"name\":\"2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UCC.2018.00027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCC.2018.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Confluence: Adaptive Spatiotemporal Data Integration Using Distributed Query Relaxation over Heterogeneous Observational Datasets
Combining data from disparate sources enhances the opportunity to explore different aspects of the phenomena under consideration. However, there are several challenges in doing so effectively that include, inter alia, the heterogeneity in data representation and format, collection patterns, and integration of foreign data attributes in a ready-to-use condition. In this study, we have designed a scalable query-oriented distributed data integration framework, Confluence, that also dynamically generates accurate interpolations for the targeted spatiotemporal scopes along with an estimate of the uncertainty involved with such estimation in case of spatiotemporal misalignment of datapoints. Confluence efficiently orchestrates computations to evaluate spatiotemporal query joins and facilitates distributed query evaluations with a dynamic relaxation of query constraints. Query evaluations are locality-aware and we leverage model-based dynamic parameter selection to provide accurate estimation for data points. We have included empirical benchmarks that profile our system in terms of accuracy, latency, and throughput at scale and also demonstrate its improvement in performance in a distributed cloud computing environment over GeoMesa, a Spark-based geospatial analytics framework.