Vldb JournalPub Date : 2025-01-01Epub Date: 2024-12-23DOI: 10.1007/s00778-024-00880-x
Willi Mann, Nikolaus Augsten, Christian S Jensen, Mateusz Pawlik
{"title":"SWOOP: top-k similarity joins over set streams.","authors":"Willi Mann, Nikolaus Augsten, Christian S Jensen, Mateusz Pawlik","doi":"10.1007/s00778-024-00880-x","DOIUrl":"10.1007/s00778-024-00880-x","url":null,"abstract":"<p><p>We provide efficient support for applications that aim to continuously find pairs of similar sets in rapid streams, such as Twitter streams that emit tweets as sets of words. Using a sliding window model, the top-<i>k</i> result changes as new sets enter the window or existing ones leave the window. Specifically, when a set arrives, it may form a new top-<i>k</i> result pair with any set already in the window. When a set leaves the window, all its pairings in the top-<i>k</i> result must be replaced with other pairs. It is therefore not sufficient to maintain the <i>k</i> most similar pairs since less similar pairs may become top-<i>k</i> pairs later. We propose SWOOP, a highly scalable stream join algorithm. Novel indexing techniques and sophisticated filters efficiently prune obsolete pairs as new sets enter the window. SWOOP incrementally maintains a provably minimal stock of similar pairs to update the top-<i>k</i> result at any time. Empirical studies confirm that SWOOP is able to support stream rates that are orders of magnitude faster than the rates supported by existing approaches.</p>","PeriodicalId":49373,"journal":{"name":"Vldb Journal","volume":"34 1","pages":"13"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11666680/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142899857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vldb JournalPub Date : 2023-09-07DOI: 10.1007/s00778-023-00811-2
Diego Arroyuelo, Adrián Gómez-Brandón, A. Hogan, G. Navarro, J. Rojas-Ledesma
{"title":"Optimizing RPQs over a compact graph representation","authors":"Diego Arroyuelo, Adrián Gómez-Brandón, A. Hogan, G. Navarro, J. Rojas-Ledesma","doi":"10.1007/s00778-023-00811-2","DOIUrl":"https://doi.org/10.1007/s00778-023-00811-2","url":null,"abstract":"","PeriodicalId":49373,"journal":{"name":"Vldb Journal","volume":" ","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44150829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vldb JournalPub Date : 2023-08-21DOI: 10.1007/s00778-023-00806-z
Nikolai Karpov, Haoyu Zhang, Qin Zhang
{"title":"MinJoin++: a fast algorithm for string similarity joins under edit distance","authors":"Nikolai Karpov, Haoyu Zhang, Qin Zhang","doi":"10.1007/s00778-023-00806-z","DOIUrl":"https://doi.org/10.1007/s00778-023-00806-z","url":null,"abstract":"","PeriodicalId":49373,"journal":{"name":"Vldb Journal","volume":" ","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45719826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vldb JournalPub Date : 2023-08-15DOI: 10.1007/s00778-023-00807-y
Tongyu Liu, Ju Fan, Guoliang Li, Nan Tang, Xiaoyong Du
{"title":"Tabular data synthesis with generative adversarial networks: design space and optimizations","authors":"Tongyu Liu, Ju Fan, Guoliang Li, Nan Tang, Xiaoyong Du","doi":"10.1007/s00778-023-00807-y","DOIUrl":"https://doi.org/10.1007/s00778-023-00807-y","url":null,"abstract":"","PeriodicalId":49373,"journal":{"name":"Vldb Journal","volume":" ","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43778051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vldb JournalPub Date : 2023-08-08DOI: 10.1007/s00778-023-00804-1
K. Mouratidis, Keming Li, Bo Tang
{"title":"Quantifying the competitiveness of a dataset in relation to general preferences","authors":"K. Mouratidis, Keming Li, Bo Tang","doi":"10.1007/s00778-023-00804-1","DOIUrl":"https://doi.org/10.1007/s00778-023-00804-1","url":null,"abstract":"","PeriodicalId":49373,"journal":{"name":"Vldb Journal","volume":" ","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46573430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vldb JournalPub Date : 2023-07-07DOI: 10.1007/s00778-023-00802-3
Yan Zhao, Kai Zheng, Ziwei Wang, Liwei Deng, B. Yang, T. Pedersen, Christian S. Jensen, Xiaofang Zhou
{"title":"Coalition-based task assignment with priority-aware fairness in spatial crowdsourcing","authors":"Yan Zhao, Kai Zheng, Ziwei Wang, Liwei Deng, B. Yang, T. Pedersen, Christian S. Jensen, Xiaofang Zhou","doi":"10.1007/s00778-023-00802-3","DOIUrl":"https://doi.org/10.1007/s00778-023-00802-3","url":null,"abstract":"","PeriodicalId":49373,"journal":{"name":"Vldb Journal","volume":" ","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42035332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}