{"title":"A generic front-stage for semi-stream processing","authors":"M. Naeem, Gerald Weber, G. Dobbie, C. Lutteroth","doi":"10.1145/2505515.2505734","DOIUrl":null,"url":null,"abstract":"Recently, a number of semi-stream join algorithms have been published. The typical system setup for these consists of one fast stream input that has to be joined with a disk-based relation R. These semi-stream join approaches typically perform the join with a limited main memory partition assigned to them, which is generally not large enough to hold the whole relation R. We propose a caching approach that can be used as a front-stage for different semi-stream join algorithms, resulting in significant performance gains for common applications. We analyze our approach in the context of a seminal semi-stream join, MESHJOIN (Mesh Join), and provide a cost model for the resulting semi-stream join algorithm, which we call CMESHJOIN (Cached Mesh Join). The algorithm takes advantage of skewed distributions; this article presents results for Zipfian distributions of the type that appears in many applications.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2505515.2505734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Recently, a number of semi-stream join algorithms have been published. The typical system setup for these consists of one fast stream input that has to be joined with a disk-based relation R. These semi-stream join approaches typically perform the join with a limited main memory partition assigned to them, which is generally not large enough to hold the whole relation R. We propose a caching approach that can be used as a front-stage for different semi-stream join algorithms, resulting in significant performance gains for common applications. We analyze our approach in the context of a seminal semi-stream join, MESHJOIN (Mesh Join), and provide a cost model for the resulting semi-stream join algorithm, which we call CMESHJOIN (Cached Mesh Join). The algorithm takes advantage of skewed distributions; this article presents results for Zipfian distributions of the type that appears in many applications.