Daichi Amagata, Shohei Tsuruoka, Yusuke Arai, T. Hara
{"title":"Feat-SKSJ: Fast and Exact Algorithm for Top-k Spatial-Keyword Similarity Join","authors":"Daichi Amagata, Shohei Tsuruoka, Yusuke Arai, T. Hara","doi":"10.1145/3474717.3483629","DOIUrl":null,"url":null,"abstract":"Due to the proliferation of GPS-enabled mobile devices and IoT environments, location-based services are generating a large number of objects that contain both spatial and keyword information, and spatial-keyword databases are receiving much attention. This paper addresses the problem of top-k spatial-keyword similarity join, which outputs k object pairs with the highest similarity. This query is a primitive operator for important applications, including duplicate detection, recommendation, and clustering. The main bottleneck of the top-k spatial-keyword similarity join is to compute the similarity of a given object pair. To avoid this computation as much as possible, a state-of-the-art algorithm utilizes a filter that can skip the exact similarity computation of a given pair. However, this algorithm suffers from a loose threshold at the first stage, a high filtering cost, and the impossibility of filtering many pairs in a batch. We propose Feat-SKSJ, which removes these drawbacks and quickly outputs the exact result. Extensive experiments on real datasets show that Feat-SKSJ is significantly faster than the state-of-the-art algorithm.","PeriodicalId":340759,"journal":{"name":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474717.3483629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Due to the proliferation of GPS-enabled mobile devices and IoT environments, location-based services are generating a large number of objects that contain both spatial and keyword information, and spatial-keyword databases are receiving much attention. This paper addresses the problem of top-k spatial-keyword similarity join, which outputs k object pairs with the highest similarity. This query is a primitive operator for important applications, including duplicate detection, recommendation, and clustering. The main bottleneck of the top-k spatial-keyword similarity join is to compute the similarity of a given object pair. To avoid this computation as much as possible, a state-of-the-art algorithm utilizes a filter that can skip the exact similarity computation of a given pair. However, this algorithm suffers from a loose threshold at the first stage, a high filtering cost, and the impossibility of filtering many pairs in a batch. We propose Feat-SKSJ, which removes these drawbacks and quickly outputs the exact result. Extensive experiments on real datasets show that Feat-SKSJ is significantly faster than the state-of-the-art algorithm.