Constantinos Costa, Georgios Chatzimilioudis, D. Zeinalipour-Yazti, M. Mokbel
{"title":"Towards Real-Time Road Traffic Analytics using Telco Big Data","authors":"Constantinos Costa, Georgios Chatzimilioudis, D. Zeinalipour-Yazti, M. Mokbel","doi":"10.1145/3129292.3129296","DOIUrl":"https://doi.org/10.1145/3129292.3129296","url":null,"abstract":"A telecommunication company (telco) is traditionally only perceived as the entity that provides telecommunication services, such as telephony and data communication access to users. However, the IP backbone infrastructure of such entities spanning densely urban spaces and widely rural areas, provides nowadays a unique opportunity to collect immense amounts of mobility data that can provide valuable insights for road traffic management and avoidance. In this paper we outline the components of the Traffic-TBD (Traffic Telco Big Data) architecture, which aims to become an innovative road traffic analytic and prediction system with the following desiderata: i) provide micro-level traffic modeling and prediction that goes beyond the current state provided by Internet-based navigation enterprises utilizing crowdsourcing; ii) retain the location privacy boundaries of users inside their mobile network operator, to avoid the risks of exposing location data to third-party mobile applications; and iii) be available with minimal costs and using existing infrastructure (i.e., cell towers and TBD data streams are readily available inside a telco). Road traffic understanding, management and analytics can minimize the number of road accidents, optimize fuel and energy consumption, avoid unexpected delays, contribute to a macroscopic spatio-temporal understanding of traffic in cities but also to \"smart\" societies through applications in city planning, public transportation, logistics and fleet management for enterprises, startups and governmental bodies.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116585085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alok Pareek, Bhushan Khaladkar, Rajkumar Sen, Basar Onat, V. Nadimpalli, M. Agarwal, Nicholas Keene
{"title":"Striim: A streaming analytics platform for real-time business decisions","authors":"Alok Pareek, Bhushan Khaladkar, Rajkumar Sen, Basar Onat, V. Nadimpalli, M. Agarwal, Nicholas Keene","doi":"10.1145/3129292.3129294","DOIUrl":"https://doi.org/10.1145/3129292.3129294","url":null,"abstract":"Real-time decisions and insights over real-time data have become the essential mantra of success for many enterprises. The real-time data is generated from a multitude of sources and they come in a streaming fashion with high volume and velocity. The data could be machine generated e.g. clickstream data, logs, sensor data from IoT devices or human generated e.g. social data, mission critical transactional data. This is causing a technological shift from storage driven architectures to event driven architectures for enterprises to be able to capture, integrate and analyze these large sets of data for real-time decision making. Striim is a novel end-to-end analytics platform that enables business users to easily develop and deploy analytical applications that can generate real-time insights over real-time streaming data; business users and developers use a SQL-like declarative language (that has been extended to include streaming semantics) to write application logic in Striim. Striim provides high-throughput, low-latency event processing on commodity hardware with a scale-out architecture. In this paper, we describe the architecture of Striim and discuss some of the key aspects of the platform (a) built-in real-time data capture including streaming change data capture from transactional databases (ii) a natively built storage and query engine that uses modern data structures like skip lists to store streaming window data and performs query optimization, planning and run-time code generation (iii) enabling application de-coupling using persisted streams.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123528373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Regulating Streaming Systems: Challenges and Opportunities","authors":"A. Floratou, Ashvin Agrawal","doi":"10.1145/3129292.3129295","DOIUrl":"https://doi.org/10.1145/3129292.3129295","url":null,"abstract":"In recent years, stream processing systems have been deployed in almost every organization due to the explosion of large-scale analytics applications. Our discussions with users of these systems within Microsoft and Twitter have revealed that a major challenge with these frameworks is to tune them in order to meet the required performance and also maintain this level of performance over time. In this paper, we present the open problems and challenges in supporting streaming systems that self-regulate. Such systems automatically adjust their configuration to meet service level objectives (SLOs) even in the presence of external load variations or internal faults such as slow hardware. To address some of these challenges, we propose using machine learning techniques such as supervised learning and reinforcement learning which can potentially further improve the application management lifecycle. We believe that exploring machine learning in the context of self-regulating streaming systems is a rich area for future research with can impact the ways streaming applications are managed.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124340727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Damianos Chatziantoniou, M. Castellanos, Panos K. Chrysanthis
{"title":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","authors":"Damianos Chatziantoniou, M. Castellanos, Panos K. Chrysanthis","doi":"10.1145/3129292","DOIUrl":"https://doi.org/10.1145/3129292","url":null,"abstract":"The Eleventh International Workshop on Real-Time Business Intelligence and Analytics (BIRTE 2017), which was held on August 28, 2017 in conjunction with the VLDB 2017 Conference, provided a forum for presentation of the latest research results, new technology developments, and new applications in the areas of business intelligence, data streams and real time enterprise.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120962048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Dynamic Data Placement for Polystore Ingestion","authors":"Jiang Du, John Meehan, Nesime Tatbul, S. Zdonik","doi":"10.1145/3129292.3129297","DOIUrl":"https://doi.org/10.1145/3129292.3129297","url":null,"abstract":"Integrating low-latency data streaming into data warehouse architectures has become an important enhancement to support modern data warehousing applications. In these architectures, heterogeneous workloads with data ingestion and analytical queries must be executed with strict performance guarantees. Furthermore, the data warehouse may consists of multiple different types of storage engines (a.k.a., polystores or multi-stores). A paramount problem is data placement; different workload scenarios call for different data placement designs. Moreover, workload conditions change frequently. In this paper, we provide evidence that a dynamic, workload-driven approach is needed for data placement in polystores with low-latency data ingestion support. We study the problem based on the characteristics of the TPC-DI benchmark in the context of an abbreviated polystore that consists of S-Store and Postgres.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131024932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis, M. Sharaf, Alexandros Labrinidis
{"title":"Detection of Highly Correlated Live Data Streams","authors":"Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis, M. Sharaf, Alexandros Labrinidis","doi":"10.1145/3129292.3129298","DOIUrl":"https://doi.org/10.1145/3129292.3129298","url":null,"abstract":"More and more organizations (commercial, health, government and security) currently base their decisions on real-time analysis of fast arriving, large volumes of data streams. For such analysis to lead to actionable information in real-time and at the right time, the most recent data needs to be processed within a specified delay target. Effective solutions for analysis of such data streams rely on two techniques, (1) incremental sliding-window computation of aggregates, to avoid unnecessary recomputations and (2) intelligent scheduling of computational steps and operations. In this paper, we propose a solution that combines both of these techniques to find highly correlated data streams in real-time, using the Pearson Correlation Coefficient as a correlation metric for two windows of data streams. Specifically, we propose to partition a set of data streams into micro-batches that capture the delay target, use sliding windows within a range as the subsequences of values exhibiting a certain level of correlation, utilize the idea of sufficient statistics to incrementally compute the Pearson Correlation Coefficient of pairs of sliding windows, and adopt a deadline-aware priority scheduling to detect the highly correlated pairs of data streams. Our experimental results show that our scheme and in particular our Price-DCS with warm start scheduling algorithm outperform existing ones and enable high degree of interactivity in correlating live data streams micro-batches.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122523693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}