A. Sarma, Hongrae Lee, Hector Gonzalez, J. Madhavan, A. Halevy
{"title":"为地图可视化而持续细化大型地理数据","authors":"A. Sarma, Hongrae Lee, Hector Gonzalez, J. Madhavan, A. Halevy","doi":"10.1145/2539032.2539034","DOIUrl":null,"url":null,"abstract":"Large-scale map visualization systems play an increasingly important role in presenting geographic datasets to end-users. Since these datasets can be extremely large, a map rendering system often needs to select a small fraction of the data to visualize them in a limited space. This article addresses the fundamental challenge of thinning: determining appropriate samples of data to be shown on specific geographical regions and zoom levels. Other than the sheer scale of the data, the thinning problem is challenging because of a number of other reasons: (1) data can consist of complex geographical shapes, (2) rendering of data needs to satisfy certain constraints, such as data being preserved across zoom levels and adjacent regions, and (3) after satisfying the constraints, an optimal solution needs to be chosen based on objectives such as maximality, fairness, and importance of data.\n This article formally defines and presents a complete solution to the thinning problem. First, we express the problem as an integer programming formulation that efficiently solves thinning for desired objectives. Second, we present more efficient solutions for maximality, based on DFS traversal of a spatial tree. Third, we consider the common special case of point datasets, and present an even more efficient randomized algorithm. Fourth, we show that contiguous regions are tractable for a general version of maximality for which arbitrary regions are intractable. Fifth, we examine the structure of our integer programming formulation and show that for point datasets, our program is integral. Finally, we have implemented all techniques from this article in Google Maps [Google 2005] visualizations of fusion tables [Gonzalez et al. 2010], and we describe a set of experiments that demonstrate the trade-offs among the algorithms.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"38 1","pages":"22"},"PeriodicalIF":2.2000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Consistent thinning of large geographical data for map visualization\",\"authors\":\"A. Sarma, Hongrae Lee, Hector Gonzalez, J. Madhavan, A. Halevy\",\"doi\":\"10.1145/2539032.2539034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale map visualization systems play an increasingly important role in presenting geographic datasets to end-users. Since these datasets can be extremely large, a map rendering system often needs to select a small fraction of the data to visualize them in a limited space. This article addresses the fundamental challenge of thinning: determining appropriate samples of data to be shown on specific geographical regions and zoom levels. Other than the sheer scale of the data, the thinning problem is challenging because of a number of other reasons: (1) data can consist of complex geographical shapes, (2) rendering of data needs to satisfy certain constraints, such as data being preserved across zoom levels and adjacent regions, and (3) after satisfying the constraints, an optimal solution needs to be chosen based on objectives such as maximality, fairness, and importance of data.\\n This article formally defines and presents a complete solution to the thinning problem. First, we express the problem as an integer programming formulation that efficiently solves thinning for desired objectives. Second, we present more efficient solutions for maximality, based on DFS traversal of a spatial tree. Third, we consider the common special case of point datasets, and present an even more efficient randomized algorithm. Fourth, we show that contiguous regions are tractable for a general version of maximality for which arbitrary regions are intractable. Fifth, we examine the structure of our integer programming formulation and show that for point datasets, our program is integral. Finally, we have implemented all techniques from this article in Google Maps [Google 2005] visualizations of fusion tables [Gonzalez et al. 2010], and we describe a set of experiments that demonstrate the trade-offs among the algorithms.\",\"PeriodicalId\":50915,\"journal\":{\"name\":\"ACM Transactions on Database Systems\",\"volume\":\"38 1\",\"pages\":\"22\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Database Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/2539032.2539034\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2539032.2539034","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 13
摘要
大规模地图可视化系统在向最终用户呈现地理数据集方面发挥着越来越重要的作用。由于这些数据集可能非常大,因此地图渲染系统通常需要在有限的空间中选择一小部分数据来可视化它们。本文解决了细化的基本挑战:确定要在特定地理区域和缩放级别上显示的适当数据样本。除了数据的绝对规模之外,细化问题具有挑战性,因为许多其他原因:(1)数据可能由复杂的地理形状组成,(2)数据的呈现需要满足某些约束,例如跨缩放级别和相邻区域保存的数据,以及(3)在满足约束之后,需要根据数据的最大化,公平性和重要性等目标选择最优解决方案。本文正式定义并提出了细化问题的完整解决方案。首先,我们将问题表示为一个整数规划公式,有效地解决了期望目标的细化问题。其次,我们提出了基于空间树的DFS遍历的更有效的最大化解决方案。第三,我们考虑了点数据集的常见特殊情况,并提出了一种更有效的随机化算法。第四,我们证明了对于任意区域难以处理的最大值的一般版本,连续区域是可处理的。第五,我们检查了我们的整数规划公式的结构,并表明对于点数据集,我们的程序是积分的。最后,我们在谷歌地图[Google 2005]融合表可视化[Gonzalez et al. 2010]中实现了本文中的所有技术,并描述了一组实验,展示了算法之间的权衡。
Consistent thinning of large geographical data for map visualization
Large-scale map visualization systems play an increasingly important role in presenting geographic datasets to end-users. Since these datasets can be extremely large, a map rendering system often needs to select a small fraction of the data to visualize them in a limited space. This article addresses the fundamental challenge of thinning: determining appropriate samples of data to be shown on specific geographical regions and zoom levels. Other than the sheer scale of the data, the thinning problem is challenging because of a number of other reasons: (1) data can consist of complex geographical shapes, (2) rendering of data needs to satisfy certain constraints, such as data being preserved across zoom levels and adjacent regions, and (3) after satisfying the constraints, an optimal solution needs to be chosen based on objectives such as maximality, fairness, and importance of data.
This article formally defines and presents a complete solution to the thinning problem. First, we express the problem as an integer programming formulation that efficiently solves thinning for desired objectives. Second, we present more efficient solutions for maximality, based on DFS traversal of a spatial tree. Third, we consider the common special case of point datasets, and present an even more efficient randomized algorithm. Fourth, we show that contiguous regions are tractable for a general version of maximality for which arbitrary regions are intractable. Fifth, we examine the structure of our integer programming formulation and show that for point datasets, our program is integral. Finally, we have implemented all techniques from this article in Google Maps [Google 2005] visualizations of fusion tables [Gonzalez et al. 2010], and we describe a set of experiments that demonstrate the trade-offs among the algorithms.
期刊介绍:
Heavily used in both academic and corporate R&D settings, ACM Transactions on Database Systems (TODS) is a key publication for computer scientists working in data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databases, and operations and algorithms relating to these areas. In this rapidly changing field, TODS provides insights into the thoughts of the best minds in database R&D.