A. Kammoun, Syed Gillani, C. Gravier, Julien Subercaze
{"title":"数据流上非线性窗口的高性能top-k处理","authors":"A. Kammoun, Syed Gillani, C. Gravier, Julien Subercaze","doi":"10.1145/2933267.2933507","DOIUrl":null,"url":null,"abstract":"This year's DEBS Grand Challenge offers two very challenging queries over social networks data. These queries -- each for a different reason -- cannot be handled by traditional techniques and therefore call for the development of a specific architecture and data structures. In the first query, the novelty is the non-linearity of the expiration of the elements. Since a traditional sliding window is not suitable, we investigate here the data structures offering the best tradeoffs for all the required operations. In the second query, unlike traditional approaches where no persistent data is stored over the stream, we have to manage a friendship graph which is persistent throughout the system execution. Due to the centrality of this structure, a careful design is therefore required. The common point of the algorithmic approaches that we developed for both queries, is the overwhelming usage of bounds -- upper and lower --, in order execute expensive computations only when required. We devise, for the Query 1, a bound based on the score decay. For the Query 2, we use Turan's theorem to limit the clique computation. The combination of lazy evaluation, careful implementation and thorough testing lead to the realization of an efficient streaming process system.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"High performance top-k processing of non-linear windows over data streams\",\"authors\":\"A. Kammoun, Syed Gillani, C. Gravier, Julien Subercaze\",\"doi\":\"10.1145/2933267.2933507\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This year's DEBS Grand Challenge offers two very challenging queries over social networks data. These queries -- each for a different reason -- cannot be handled by traditional techniques and therefore call for the development of a specific architecture and data structures. In the first query, the novelty is the non-linearity of the expiration of the elements. Since a traditional sliding window is not suitable, we investigate here the data structures offering the best tradeoffs for all the required operations. In the second query, unlike traditional approaches where no persistent data is stored over the stream, we have to manage a friendship graph which is persistent throughout the system execution. Due to the centrality of this structure, a careful design is therefore required. The common point of the algorithmic approaches that we developed for both queries, is the overwhelming usage of bounds -- upper and lower --, in order execute expensive computations only when required. We devise, for the Query 1, a bound based on the score decay. For the Query 2, we use Turan's theorem to limit the clique computation. The combination of lazy evaluation, careful implementation and thorough testing lead to the realization of an efficient streaming process system.\",\"PeriodicalId\":277061,\"journal\":{\"name\":\"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2933267.2933507\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2933267.2933507","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High performance top-k processing of non-linear windows over data streams
This year's DEBS Grand Challenge offers two very challenging queries over social networks data. These queries -- each for a different reason -- cannot be handled by traditional techniques and therefore call for the development of a specific architecture and data structures. In the first query, the novelty is the non-linearity of the expiration of the elements. Since a traditional sliding window is not suitable, we investigate here the data structures offering the best tradeoffs for all the required operations. In the second query, unlike traditional approaches where no persistent data is stored over the stream, we have to manage a friendship graph which is persistent throughout the system execution. Due to the centrality of this structure, a careful design is therefore required. The common point of the algorithmic approaches that we developed for both queries, is the overwhelming usage of bounds -- upper and lower --, in order execute expensive computations only when required. We devise, for the Query 1, a bound based on the score decay. For the Query 2, we use Turan's theorem to limit the clique computation. The combination of lazy evaluation, careful implementation and thorough testing lead to the realization of an efficient streaming process system.