Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics最新文献

Query-Driven Data Profiling with OCEANProfile 查询驱动的数据分析与OCEANProfile

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics Pub Date : 2018-08-27 DOI: 10.1145/3242153.3242154

A. Wahl, Christian Sauerhammer, Peter K. Schwab, Sebastian Herbst, R. Lenz

{"title":"Query-Driven Data Profiling with OCEANProfile","authors":"A. Wahl, Christian Sauerhammer, Peter K. Schwab, Sebastian Herbst, R. Lenz","doi":"10.1145/3242153.3242154","DOIUrl":"https://doi.org/10.1145/3242153.3242154","url":null,"abstract":"Complex data analysis scenarios often require discovering and combining multiple data sources. Data scientists usually formulate a series of SQL queries building on each other, also called a session, to iteratively derive results. However, due to a lack of familiarity with data sources or the complexity of query results, it can be a hard task to decide on the next query iteration solely based on the results of the last one. While existing approaches provide mechanisms to assess the results of a specific query, support for analyzing results in the context of the respective session remains mostly absent. Such approaches do also not seamlessly integrate with established tools and workflows. To overcome these problems, we introduce OCEANProfile, a framework for session-based profiling of query results. Query results are intercepted at driver level and streamed into our framework for automated data profiling. Result profiles can be compared with those of previous queries and visualized in a companion app compatible with existing analysis tools. Visualizations are automatically ranked according to their usefulness in the context of the respective session.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130331395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Oracle TimesTen Scaleout: A New Scale-Out In-Memory Database Architecture for Extreme OLTP Oracle TimesTen Scaleout:一种用于极限OLTP的新的内存扩展数据库体系结构

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics Pub Date : 2018-08-27 DOI: 10.1145/3242153.3271881

Y. Chou, A. Raghavan, T. Lahiri

引用次数: 2

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics 实时商业智能与分析国际研讨会论文集

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics Pub Date : 2018-08-27 DOI: 10.1145/3242153

引用次数: 2

Moira: A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems Moira:分布式流处理系统中动态资源成本估算的目标导向增量机器学习方法

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics Pub Date : 2018-08-27 DOI: 10.1145/3242153.3242160

D. Foroni, Cristian Axenie, S. Bortoli, Mohamad Al Hajj Hassan, Ralph Acker, R. Tudoran, G. Brasche, Yannis Velegrakis

{"title":"Moira: A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems","authors":"D. Foroni, Cristian Axenie, S. Bortoli, Mohamad Al Hajj Hassan, Ralph Acker, R. Tudoran, G. Brasche, Yannis Velegrakis","doi":"10.1145/3242153.3242160","DOIUrl":"https://doi.org/10.1145/3242153.3242160","url":null,"abstract":"The need for real-time analysis is still spreading and the number of available streaming sources is increasing. The recent literature has plenty of works on Data Stream Processing (DSP). In a streaming environment, the data incoming rate varies over time. The challenge is how to efficiently deploy these applications in a cluster. Several works have been conducted on improving the latency of the system or to minimize the allocated resources per application through time. However, to the best of our knowledge, none of the existing works takes into consideration the user needs for a specific application, which is different from one user to another. In this paper, we propose Moria, a goal-oriented framework for dynamically optimizing the resource allocation built on top of Apache Flink. The system takes actions based on the user application and on the incoming data characteristics (i.e., input rate and window size). Starting from an initial estimation of the resources needed for the user query, at each iteration we improve our cost function with the collected metrics from the monitored system about the incoming data, to fulfill the user needs. We present a series of experiments that show in which cases our dynamic estimation outperforms the baseline Apache Flink and the thumb rule estimation alone performed at the deployment of the applications.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124676771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Streams and Tables: Two Sides of the Same Coin 河流和桌子:同一枚硬币的两面

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics Pub Date : 2018-08-27 DOI: 10.1145/3242153.3242155

Matthias Sax, Guozhang Wang, M. Weidlich, J. Freytag

{"title":"Streams and Tables: Two Sides of the Same Coin","authors":"Matthias Sax, Guozhang Wang, M. Weidlich, J. Freytag","doi":"10.1145/3242153.3242155","DOIUrl":"https://doi.org/10.1145/3242153.3242155","url":null,"abstract":"Stream processing has emerged as a paradigm for applications that require low-latency evaluation of operators over unbounded sequences of data. Defining the semantics of stream processing is challenging in the presence of distributed data sources. The physical and logical order of data in a stream may become inconsistent in such a setting. Existing models either neglect these inconsistencies or handle them by means of data buffering and reordering techniques, thereby compromising processing latency. In this paper, we introduce the Dual Streaming Model to reason about physical and logical order in data stream processing. This model presents the result of an operator as a stream of successive updates, which induces a duality of results and streams. As such, it provides a natural way to cope with inconsistencies between the physical and logical order of streaming data in a continuous manner, without explicit buffering and reordering. We further discuss the trade-offs and challenges faced when implementing this model in terms of correctness, latency, and processing cost. A case study based on Apache Kafka illustrates the effectiveness of our model in the light of real-world requirements.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131120092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Anomaly Detection and Explanation Discovery on Event Streams 事件流的异常检测与解释发现

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics Pub Date : 2018-08-27 DOI: 10.1145/3242153.3242158

Fei Song, Boyao Zhou, Quan Sun, Wang Sun, Shiwen Xia, Y. Diao

引用次数: 3

Real-time ETL in Striim strim中的实时ETL

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics Pub Date : 2018-08-27 DOI: 10.1145/3242153.3242157

Alok Pareek, Bhushan Khaladkar, Rajkumar Sen, Basar Onat, V. Nadimpalli, L. Mahadevan

{"title":"Real-time ETL in Striim","authors":"Alok Pareek, Bhushan Khaladkar, Rajkumar Sen, Basar Onat, V. Nadimpalli, L. Mahadevan","doi":"10.1145/3242153.3242157","DOIUrl":"https://doi.org/10.1145/3242153.3242157","url":null,"abstract":"In the new digital economy, on demand access of real time enterprise data is critical to modernize cross organizational, cross partner, and online consumer functions. In addition to on premise legacy data, enterprises are producing an enormous amount of real-time data through new hybrid cloud applications; these event streams need to be collected, transformed and analyzed in real-time to make critical business decision. Traditional Extract-Load-Transform (ETL) processes are no longer sufficient and need to be re-architected to account for streaming, heterogeneity, usability, extensibility (custom processing), and continuous validity. Striim is a novel end-to-end distributed streaming ETL and intelligence platform that enables rapid development and deployment of streaming applications. Striim's real-time ETL engine has been architected from ground-up to enable both business users and developers to build and deploy streaming applications. In this paper, we describe some of the core features of Striim's ETL engine (i) built-in adapters to extract and load data in real-time from legacy and new cloud sources/targets (ii) an extensible SQL-based transformation engine to transform events; users can inject custom logic via a component called Open Processor (iv) New primitives like MODIFY, BEFORE and AFTER and (v) built-in data validation that continuously checks if everything is continually making it to the destination.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116439741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Towards Automated Data Integration in Software Analytics 迈向软件分析中的自动化数据集成

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics Pub Date : 2018-08-16 DOI: 10.1145/3242153.3242159

Silverio Martínez-Fernández, P. Jovanovic, Xavier Franch, Andreas Jedlitschka

{"title":"Towards Automated Data Integration in Software Analytics","authors":"Silverio Martínez-Fernández, P. Jovanovic, Xavier Franch, Andreas Jedlitschka","doi":"10.1145/3242153.3242159","DOIUrl":"https://doi.org/10.1145/3242153.3242159","url":null,"abstract":"Software organizations want to be able to base their decisions on the latest set of available data and the real-time analytics derived from them. In order to support \"real-time enterprise\" for software organizations and provide information transparency for diverse stakeholders, we integrate heterogeneous data sources about software analytics, such as static code analysis, testing results, issue tracking systems, network monitoring systems, etc. To deal with the heterogeneity of the underlying data sources, we follow an ontology-based data integration approach in this paper and define an ontology that captures the semantics of relevant data for software analytics. Furthermore, we focus on the integration of such data sources by proposing two approaches: a static and a dynamic one. We first discuss the current static approach with a predefined set of analytic views representing software quality factors and further envision how this process could be automated in order to dynamically build custom user analysis using a semi-automatic platform for managing the lifecycle of analytics infrastructures.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116797021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective 用有限的人力改进基于机器的实体解析:风险视角

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics Pub Date : 2018-05-31 DOI: 10.1145/3242153.3242156

Zhaoqiang Chen, Qun Chen, Boyi Hou, Ahmed Murtadha, Zhanhuai Li

{"title":"Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective","authors":"Zhaoqiang Chen, Qun Chen, Boyi Hou, Ahmed Murtadha, Zhanhuai Li","doi":"10.1145/3242153.3242156","DOIUrl":"https://doi.org/10.1145/3242153.3242156","url":null,"abstract":"Pure machine-based solutions usually struggle in the challenging classification tasks such as entity resolution (ER). To alleviate this problem, a recent trend is to involve the human in the resolution process, most notably the crowdsourcing approach. However, it remains very challenging to effectively improve machine-based entity resolution with limited human effort. In this paper, we investigate the problem of human and machine cooperation for ER from a risk perspective. We propose to select the machine-labeled instances at high risk of being mislabeled for manual verification. For this task, we present a risk model that takes into consideration the human-labeled instances as well as the output of machine resolution. Finally, we evaluate the performance of the proposed risk model on real data. Our experiments demonstrate that it can pick up the mislabeled instances with considerably higher accuracy than the existing alternatives. Provided with the same amount of human cost budget, it can also achieve better resolution quality than the state-of-the-art approach based on active learning.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128074684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Performing OLAP over Graph Data: Query Language, Implementation, and a Case Study 在图数据上执行OLAP:查询语言、实现和案例研究

Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics Pub Date : 2017-08-28 DOI: 10.1145/3129292.3129293

Leticia I. Gómez, B. Kuijpers, A. Vaisman

引用次数: 16