strim中的实时ETL

Alok Pareek, Bhushan Khaladkar, Rajkumar Sen, Basar Onat, V. Nadimpalli, L. Mahadevan
{"title":"strim中的实时ETL","authors":"Alok Pareek, Bhushan Khaladkar, Rajkumar Sen, Basar Onat, V. Nadimpalli, L. Mahadevan","doi":"10.1145/3242153.3242157","DOIUrl":null,"url":null,"abstract":"In the new digital economy, on demand access of real time enterprise data is critical to modernize cross organizational, cross partner, and online consumer functions. In addition to on premise legacy data, enterprises are producing an enormous amount of real-time data through new hybrid cloud applications; these event streams need to be collected, transformed and analyzed in real-time to make critical business decision. Traditional Extract-Load-Transform (ETL) processes are no longer sufficient and need to be re-architected to account for streaming, heterogeneity, usability, extensibility (custom processing), and continuous validity. Striim is a novel end-to-end distributed streaming ETL and intelligence platform that enables rapid development and deployment of streaming applications. Striim's real-time ETL engine has been architected from ground-up to enable both business users and developers to build and deploy streaming applications. In this paper, we describe some of the core features of Striim's ETL engine (i) built-in adapters to extract and load data in real-time from legacy and new cloud sources/targets (ii) an extensible SQL-based transformation engine to transform events; users can inject custom logic via a component called Open Processor (iv) New primitives like MODIFY, BEFORE and AFTER and (v) built-in data validation that continuously checks if everything is continually making it to the destination.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Real-time ETL in Striim\",\"authors\":\"Alok Pareek, Bhushan Khaladkar, Rajkumar Sen, Basar Onat, V. Nadimpalli, L. Mahadevan\",\"doi\":\"10.1145/3242153.3242157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the new digital economy, on demand access of real time enterprise data is critical to modernize cross organizational, cross partner, and online consumer functions. In addition to on premise legacy data, enterprises are producing an enormous amount of real-time data through new hybrid cloud applications; these event streams need to be collected, transformed and analyzed in real-time to make critical business decision. Traditional Extract-Load-Transform (ETL) processes are no longer sufficient and need to be re-architected to account for streaming, heterogeneity, usability, extensibility (custom processing), and continuous validity. Striim is a novel end-to-end distributed streaming ETL and intelligence platform that enables rapid development and deployment of streaming applications. Striim's real-time ETL engine has been architected from ground-up to enable both business users and developers to build and deploy streaming applications. In this paper, we describe some of the core features of Striim's ETL engine (i) built-in adapters to extract and load data in real-time from legacy and new cloud sources/targets (ii) an extensible SQL-based transformation engine to transform events; users can inject custom logic via a component called Open Processor (iv) New primitives like MODIFY, BEFORE and AFTER and (v) built-in data validation that continuously checks if everything is continually making it to the destination.\",\"PeriodicalId\":407894,\"journal\":{\"name\":\"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3242153.3242157\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242153.3242157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

在新的数字经济中,实时企业数据的按需访问对于跨组织、跨合作伙伴和在线消费者功能的现代化至关重要。除了内部遗留数据外,企业还通过新的混合云应用程序生成大量实时数据;这些事件流需要实时收集、转换和分析,以做出关键的业务决策。传统的提取-负载-转换(ETL)过程不再足够,需要重新架构,以考虑流、异构性、可用性、可扩展性(自定义处理)和持续有效性。strim是一个新颖的端到端分布式流ETL和智能平台,支持流应用程序的快速开发和部署。strim的实时ETL引擎从头开始构建,使业务用户和开发人员都能够构建和部署流应用程序。在本文中,我们描述了Striim的ETL引擎的一些核心特性(i)内置适配器,用于从遗留的和新的云源/目标中实时提取和加载数据(ii)可扩展的基于sql的转换引擎,用于转换事件;用户可以通过一个名为Open Processor的组件注入自定义逻辑(iv)新的原语,如MODIFY, BEFORE和AFTER,以及(v)内置的数据验证,持续检查是否所有内容都持续地到达目的地。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Real-time ETL in Striim
In the new digital economy, on demand access of real time enterprise data is critical to modernize cross organizational, cross partner, and online consumer functions. In addition to on premise legacy data, enterprises are producing an enormous amount of real-time data through new hybrid cloud applications; these event streams need to be collected, transformed and analyzed in real-time to make critical business decision. Traditional Extract-Load-Transform (ETL) processes are no longer sufficient and need to be re-architected to account for streaming, heterogeneity, usability, extensibility (custom processing), and continuous validity. Striim is a novel end-to-end distributed streaming ETL and intelligence platform that enables rapid development and deployment of streaming applications. Striim's real-time ETL engine has been architected from ground-up to enable both business users and developers to build and deploy streaming applications. In this paper, we describe some of the core features of Striim's ETL engine (i) built-in adapters to extract and load data in real-time from legacy and new cloud sources/targets (ii) an extensible SQL-based transformation engine to transform events; users can inject custom logic via a component called Open Processor (iv) New primitives like MODIFY, BEFORE and AFTER and (v) built-in data validation that continuously checks if everything is continually making it to the destination.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信