Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics最新文献

筛选
英文 中文
Query-Driven Data Profiling with OCEANProfile 查询驱动的数据分析与OCEANProfile
A. Wahl, Christian Sauerhammer, Peter K. Schwab, Sebastian Herbst, R. Lenz
{"title":"Query-Driven Data Profiling with OCEANProfile","authors":"A. Wahl, Christian Sauerhammer, Peter K. Schwab, Sebastian Herbst, R. Lenz","doi":"10.1145/3242153.3242154","DOIUrl":"https://doi.org/10.1145/3242153.3242154","url":null,"abstract":"Complex data analysis scenarios often require discovering and combining multiple data sources. Data scientists usually formulate a series of SQL queries building on each other, also called a session, to iteratively derive results. However, due to a lack of familiarity with data sources or the complexity of query results, it can be a hard task to decide on the next query iteration solely based on the results of the last one. While existing approaches provide mechanisms to assess the results of a specific query, support for analyzing results in the context of the respective session remains mostly absent. Such approaches do also not seamlessly integrate with established tools and workflows. To overcome these problems, we introduce OCEANProfile, a framework for session-based profiling of query results. Query results are intercepted at driver level and streamed into our framework for automated data profiling. Result profiles can be compared with those of previous queries and visualized in a companion app compatible with existing analysis tools. Visualizations are automatically ranked according to their usefulness in the context of the respective session.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130331395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Oracle TimesTen Scaleout: A New Scale-Out In-Memory Database Architecture for Extreme OLTP Oracle TimesTen Scaleout:一种用于极限OLTP的新的内存扩展数据库体系结构
Y. Chou, A. Raghavan, T. Lahiri
{"title":"Oracle TimesTen Scaleout: A New Scale-Out In-Memory Database Architecture for Extreme OLTP","authors":"Y. Chou, A. Raghavan, T. Lahiri","doi":"10.1145/3242153.3271881","DOIUrl":"https://doi.org/10.1145/3242153.3271881","url":null,"abstract":"Oracle TimesTen Scaleout is a shared-nothing scale-out inmemory database designed for extreme OLTP workloads, such as IoT, real-time fraud detection, telecommunications etc. TimesTen Scaleout features rich SQL including complex queries with joins, aggregations and analytic functions, transparent distributed execution, full ACID multi-statement transactions, global secondary indexes and sequences. The design features built-in high availability using a k-safe data duplication mechanism, and transparent failure handling in order to minimize application down time. All management functions such as installation, configuration, and monitoring are provided via a centralized management repository. We describe some of the challenges in developing such a scale-out in-memory database architecture and show extreme performance results exceeding 100 million transactions per second and 1 billion SQL selects per second.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"32 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114270739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics 实时商业智能与分析国际研讨会论文集
{"title":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","authors":"","doi":"10.1145/3242153","DOIUrl":"https://doi.org/10.1145/3242153","url":null,"abstract":"","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116248382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Moira: A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems Moira:分布式流处理系统中动态资源成本估算的目标导向增量机器学习方法
D. Foroni, Cristian Axenie, S. Bortoli, Mohamad Al Hajj Hassan, Ralph Acker, R. Tudoran, G. Brasche, Yannis Velegrakis
{"title":"Moira: A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems","authors":"D. Foroni, Cristian Axenie, S. Bortoli, Mohamad Al Hajj Hassan, Ralph Acker, R. Tudoran, G. Brasche, Yannis Velegrakis","doi":"10.1145/3242153.3242160","DOIUrl":"https://doi.org/10.1145/3242153.3242160","url":null,"abstract":"The need for real-time analysis is still spreading and the number of available streaming sources is increasing. The recent literature has plenty of works on Data Stream Processing (DSP). In a streaming environment, the data incoming rate varies over time. The challenge is how to efficiently deploy these applications in a cluster. Several works have been conducted on improving the latency of the system or to minimize the allocated resources per application through time. However, to the best of our knowledge, none of the existing works takes into consideration the user needs for a specific application, which is different from one user to another. In this paper, we propose Moria, a goal-oriented framework for dynamically optimizing the resource allocation built on top of Apache Flink. The system takes actions based on the user application and on the incoming data characteristics (i.e., input rate and window size). Starting from an initial estimation of the resources needed for the user query, at each iteration we improve our cost function with the collected metrics from the monitored system about the incoming data, to fulfill the user needs. We present a series of experiments that show in which cases our dynamic estimation outperforms the baseline Apache Flink and the thumb rule estimation alone performed at the deployment of the applications.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124676771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Streams and Tables: Two Sides of the Same Coin 河流和桌子:同一枚硬币的两面
Matthias Sax, Guozhang Wang, M. Weidlich, J. Freytag
{"title":"Streams and Tables: Two Sides of the Same Coin","authors":"Matthias Sax, Guozhang Wang, M. Weidlich, J. Freytag","doi":"10.1145/3242153.3242155","DOIUrl":"https://doi.org/10.1145/3242153.3242155","url":null,"abstract":"Stream processing has emerged as a paradigm for applications that require low-latency evaluation of operators over unbounded sequences of data. Defining the semantics of stream processing is challenging in the presence of distributed data sources. The physical and logical order of data in a stream may become inconsistent in such a setting. Existing models either neglect these inconsistencies or handle them by means of data buffering and reordering techniques, thereby compromising processing latency. In this paper, we introduce the Dual Streaming Model to reason about physical and logical order in data stream processing. This model presents the result of an operator as a stream of successive updates, which induces a duality of results and streams. As such, it provides a natural way to cope with inconsistencies between the physical and logical order of streaming data in a continuous manner, without explicit buffering and reordering. We further discuss the trade-offs and challenges faced when implementing this model in terms of correctness, latency, and processing cost. A case study based on Apache Kafka illustrates the effectiveness of our model in the light of real-world requirements.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131120092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Anomaly Detection and Explanation Discovery on Event Streams 事件流的异常检测与解释发现
Fei Song, Boyao Zhou, Quan Sun, Wang Sun, Shiwen Xia, Y. Diao
{"title":"Anomaly Detection and Explanation Discovery on Event Streams","authors":"Fei Song, Boyao Zhou, Quan Sun, Wang Sun, Shiwen Xia, Y. Diao","doi":"10.1145/3242153.3242158","DOIUrl":"https://doi.org/10.1145/3242153.3242158","url":null,"abstract":"As enterprise information systems are collecting event streams from various sources, the ability of a system to automatically detect anomalous events and further provide human readable explanations is of paramount importance. In this position paper, we argue for the need of a new type of data stream analytics that can address anomaly detection and explanation discovery in a single, integrated system, which not only offers increased business intelligence, but also opens up opportunities for improved solutions. In particular, we propose a two-pass approach to building such a system, highlight the challenges, and offer initial directions for solutions.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127831923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Real-time ETL in Striim strim中的实时ETL
Alok Pareek, Bhushan Khaladkar, Rajkumar Sen, Basar Onat, V. Nadimpalli, L. Mahadevan
{"title":"Real-time ETL in Striim","authors":"Alok Pareek, Bhushan Khaladkar, Rajkumar Sen, Basar Onat, V. Nadimpalli, L. Mahadevan","doi":"10.1145/3242153.3242157","DOIUrl":"https://doi.org/10.1145/3242153.3242157","url":null,"abstract":"In the new digital economy, on demand access of real time enterprise data is critical to modernize cross organizational, cross partner, and online consumer functions. In addition to on premise legacy data, enterprises are producing an enormous amount of real-time data through new hybrid cloud applications; these event streams need to be collected, transformed and analyzed in real-time to make critical business decision. Traditional Extract-Load-Transform (ETL) processes are no longer sufficient and need to be re-architected to account for streaming, heterogeneity, usability, extensibility (custom processing), and continuous validity. Striim is a novel end-to-end distributed streaming ETL and intelligence platform that enables rapid development and deployment of streaming applications. Striim's real-time ETL engine has been architected from ground-up to enable both business users and developers to build and deploy streaming applications. In this paper, we describe some of the core features of Striim's ETL engine (i) built-in adapters to extract and load data in real-time from legacy and new cloud sources/targets (ii) an extensible SQL-based transformation engine to transform events; users can inject custom logic via a component called Open Processor (iv) New primitives like MODIFY, BEFORE and AFTER and (v) built-in data validation that continuously checks if everything is continually making it to the destination.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116439741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Towards Automated Data Integration in Software Analytics 迈向软件分析中的自动化数据集成
Silverio Martínez-Fernández, P. Jovanovic, Xavier Franch, Andreas Jedlitschka
{"title":"Towards Automated Data Integration in Software Analytics","authors":"Silverio Martínez-Fernández, P. Jovanovic, Xavier Franch, Andreas Jedlitschka","doi":"10.1145/3242153.3242159","DOIUrl":"https://doi.org/10.1145/3242153.3242159","url":null,"abstract":"Software organizations want to be able to base their decisions on the latest set of available data and the real-time analytics derived from them. In order to support \"real-time enterprise\" for software organizations and provide information transparency for diverse stakeholders, we integrate heterogeneous data sources about software analytics, such as static code analysis, testing results, issue tracking systems, network monitoring systems, etc. To deal with the heterogeneity of the underlying data sources, we follow an ontology-based data integration approach in this paper and define an ontology that captures the semantics of relevant data for software analytics. Furthermore, we focus on the integration of such data sources by proposing two approaches: a static and a dynamic one. We first discuss the current static approach with a predefined set of analytic views representing software quality factors and further envision how this process could be automated in order to dynamically build custom user analysis using a semi-automatic platform for managing the lifecycle of analytics infrastructures.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116797021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective 用有限的人力改进基于机器的实体解析:风险视角
Zhaoqiang Chen, Qun Chen, Boyi Hou, Ahmed Murtadha, Zhanhuai Li
{"title":"Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective","authors":"Zhaoqiang Chen, Qun Chen, Boyi Hou, Ahmed Murtadha, Zhanhuai Li","doi":"10.1145/3242153.3242156","DOIUrl":"https://doi.org/10.1145/3242153.3242156","url":null,"abstract":"Pure machine-based solutions usually struggle in the challenging classification tasks such as entity resolution (ER). To alleviate this problem, a recent trend is to involve the human in the resolution process, most notably the crowdsourcing approach. However, it remains very challenging to effectively improve machine-based entity resolution with limited human effort. In this paper, we investigate the problem of human and machine cooperation for ER from a risk perspective. We propose to select the machine-labeled instances at high risk of being mislabeled for manual verification. For this task, we present a risk model that takes into consideration the human-labeled instances as well as the output of machine resolution. Finally, we evaluate the performance of the proposed risk model on real data. Our experiments demonstrate that it can pick up the mislabeled instances with considerably higher accuracy than the existing alternatives. Provided with the same amount of human cost budget, it can also achieve better resolution quality than the state-of-the-art approach based on active learning.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128074684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Performing OLAP over Graph Data: Query Language, Implementation, and a Case Study 在图数据上执行OLAP:查询语言、实现和案例研究
Leticia I. Gómez, B. Kuijpers, A. Vaisman
{"title":"Performing OLAP over Graph Data: Query Language, Implementation, and a Case Study","authors":"Leticia I. Gómez, B. Kuijpers, A. Vaisman","doi":"10.1145/3129292.3129293","DOIUrl":"https://doi.org/10.1145/3129292.3129293","url":null,"abstract":"In current Big Data scenarios, traditional data warehousing and Online Analytical Processing (OLAP) operations on cubes are clearly not sufficient to address the current data analysis requirements. Nevertheless, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation. In spite of this, there is not much work on the problem of taking OLAP analysis to the graph data model. In previous work we proposed a multidimensional (MD) data model for graph analysis, that considers not only the basic graph data, but background information in the form of dimension hierarchies as well. The graphs in our model are node- and edge-labelled directed multi-hypergraphs, called graphoids, defined at several different levels of granularity. In this paper we show how we implemented this proposal over the widely used Neo4J graph database, discuss implementation issues, and present a detailed case study to show how OLAP operations can be used on graphs.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114388126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信