{"title":"基于相邻依赖划分和列计算的并行同址模式挖掘","authors":"Peizhong Yang, Lizhen Wang, Xiaoxuan Wang, Lihua Zhou, Hongmei Chen","doi":"10.1145/3474717.3483984","DOIUrl":null,"url":null,"abstract":"A co-location pattern is a subset of spatial features whose instances are frequently located together in proximate areas. Mining co-location patterns can discover spatial dependencies in spatial datasets and have particular value in many applications. However, it is challengeable to discover co-location patterns from massive spatial datasets, due to the expensive computational cost. In this paper, we present a novel parallel co-location pattern mining approach. First, dividing spatial neighbor relationships into some neighbor-dependency partitions enables to perform mining task on each partition independently in parallel. Then, a column-based calculation approach is proposed to replace the time-consuming generation of table instances for calculating the prevalence of patterns. To further reduce the search space of patterns on each partition, two pruning strategies are suggested. We implement the parallel co-location pattern mining algorithm based on neighbor-dependency partition and column calculation via MapReduce, named PCPM-NDPCC. Substantial experiments are conducted on real and synthetic datasets to examine the performance of PCPM-NDPCC. Experimental results reveal that PCPM-NDPCC has a significant improvement in efficiency than baseline algorithms and shows better scalability for massive spatial data processing.","PeriodicalId":340759,"journal":{"name":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Parallel Co-location Pattern Mining based on Neighbor-Dependency Partition and Column Calculation\",\"authors\":\"Peizhong Yang, Lizhen Wang, Xiaoxuan Wang, Lihua Zhou, Hongmei Chen\",\"doi\":\"10.1145/3474717.3483984\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A co-location pattern is a subset of spatial features whose instances are frequently located together in proximate areas. Mining co-location patterns can discover spatial dependencies in spatial datasets and have particular value in many applications. However, it is challengeable to discover co-location patterns from massive spatial datasets, due to the expensive computational cost. In this paper, we present a novel parallel co-location pattern mining approach. First, dividing spatial neighbor relationships into some neighbor-dependency partitions enables to perform mining task on each partition independently in parallel. Then, a column-based calculation approach is proposed to replace the time-consuming generation of table instances for calculating the prevalence of patterns. To further reduce the search space of patterns on each partition, two pruning strategies are suggested. We implement the parallel co-location pattern mining algorithm based on neighbor-dependency partition and column calculation via MapReduce, named PCPM-NDPCC. Substantial experiments are conducted on real and synthetic datasets to examine the performance of PCPM-NDPCC. Experimental results reveal that PCPM-NDPCC has a significant improvement in efficiency than baseline algorithms and shows better scalability for massive spatial data processing.\",\"PeriodicalId\":340759,\"journal\":{\"name\":\"Proceedings of the 29th International Conference on Advances in Geographic Information Systems\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 29th International Conference on Advances in Geographic Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3474717.3483984\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474717.3483984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel Co-location Pattern Mining based on Neighbor-Dependency Partition and Column Calculation
A co-location pattern is a subset of spatial features whose instances are frequently located together in proximate areas. Mining co-location patterns can discover spatial dependencies in spatial datasets and have particular value in many applications. However, it is challengeable to discover co-location patterns from massive spatial datasets, due to the expensive computational cost. In this paper, we present a novel parallel co-location pattern mining approach. First, dividing spatial neighbor relationships into some neighbor-dependency partitions enables to perform mining task on each partition independently in parallel. Then, a column-based calculation approach is proposed to replace the time-consuming generation of table instances for calculating the prevalence of patterns. To further reduce the search space of patterns on each partition, two pruning strategies are suggested. We implement the parallel co-location pattern mining algorithm based on neighbor-dependency partition and column calculation via MapReduce, named PCPM-NDPCC. Substantial experiments are conducted on real and synthetic datasets to examine the performance of PCPM-NDPCC. Experimental results reveal that PCPM-NDPCC has a significant improvement in efficiency than baseline algorithms and shows better scalability for massive spatial data processing.