{"title":"Apache Wayang:统一的数据分析框架","authors":"Kaustubh Beedkar, Bertty Contreras-Rojas, Haralampos Gavriilidis, Zoi Kaoudi, Volker Markl, Rodrigo Pardo-Meza, Jorge-Arnulfo Quiané-Ruiz","doi":"10.1145/3631504.3631510","DOIUrl":null,"url":null,"abstract":"The large variety of specialized data processing platforms and the increased complexity of data analytics has led to the need for unifying data analytics within a single framework. Such a framework should free users from the burden of (i) choosing the right platform( s) and (ii) gluing code between the different parts of their pipelines. Apache Wayang (Incubating) is the only open-source framework that provides a systematic solution to unified data analytics by integrating multiple heterogeneous data processing platforms. It achieves that by decoupling applications from the underlying platforms and providing an optimizer so that users do not have to specify the platforms on which their pipeline should run. Wayang provides a unified view and processing model, effectively integrating the hodgepodge of heterogeneous platforms into a single framework with increased usability without sacrificing performance and total cost of ownership. In this paper, we present the architecture ofWayang, describe its main components, and give an outlook on future directions.","PeriodicalId":49524,"journal":{"name":"Sigmod Record","volume":"128 7","pages":"0"},"PeriodicalIF":0.9000,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Apache Wayang: A Unified Data Analytics Framework\",\"authors\":\"Kaustubh Beedkar, Bertty Contreras-Rojas, Haralampos Gavriilidis, Zoi Kaoudi, Volker Markl, Rodrigo Pardo-Meza, Jorge-Arnulfo Quiané-Ruiz\",\"doi\":\"10.1145/3631504.3631510\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The large variety of specialized data processing platforms and the increased complexity of data analytics has led to the need for unifying data analytics within a single framework. Such a framework should free users from the burden of (i) choosing the right platform( s) and (ii) gluing code between the different parts of their pipelines. Apache Wayang (Incubating) is the only open-source framework that provides a systematic solution to unified data analytics by integrating multiple heterogeneous data processing platforms. It achieves that by decoupling applications from the underlying platforms and providing an optimizer so that users do not have to specify the platforms on which their pipeline should run. Wayang provides a unified view and processing model, effectively integrating the hodgepodge of heterogeneous platforms into a single framework with increased usability without sacrificing performance and total cost of ownership. In this paper, we present the architecture ofWayang, describe its main components, and give an outlook on future directions.\",\"PeriodicalId\":49524,\"journal\":{\"name\":\"Sigmod Record\",\"volume\":\"128 7\",\"pages\":\"0\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2023-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sigmod Record\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3631504.3631510\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sigmod Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3631504.3631510","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
各种各样的专业数据处理平台和数据分析的复杂性增加导致需要在单一框架内统一数据分析。这样的框架应该让用户从以下两个负担中解脱出来:(1)选择正确的平台;(2)在管道的不同部分之间粘接代码。Apache Wayang (Incubating)是唯一一个通过集成多个异构数据处理平台,为统一数据分析提供系统解决方案的开源框架。它通过将应用程序与底层平台解耦并提供优化器来实现这一点,这样用户就不必指定他们的管道应该在哪个平台上运行。Wayang提供了一个统一的视图和处理模型,有效地将异构平台的大杂烩集成到一个框架中,在不牺牲性能和总拥有成本的情况下提高了可用性。在本文中,我们介绍了大阳的架构,描述了它的主要组成部分,并对未来的发展方向进行了展望。
The large variety of specialized data processing platforms and the increased complexity of data analytics has led to the need for unifying data analytics within a single framework. Such a framework should free users from the burden of (i) choosing the right platform( s) and (ii) gluing code between the different parts of their pipelines. Apache Wayang (Incubating) is the only open-source framework that provides a systematic solution to unified data analytics by integrating multiple heterogeneous data processing platforms. It achieves that by decoupling applications from the underlying platforms and providing an optimizer so that users do not have to specify the platforms on which their pipeline should run. Wayang provides a unified view and processing model, effectively integrating the hodgepodge of heterogeneous platforms into a single framework with increased usability without sacrificing performance and total cost of ownership. In this paper, we present the architecture ofWayang, describe its main components, and give an outlook on future directions.
期刊介绍:
SIGMOD investigates the development and application of database technology to support the full range of data management needs. The scope of interests and members is wide with an almost equal mix of people from industryand academia. SIGMOD sponsors an annual conference that is regarded as one of the most important in the field, particularly for practitioners.
Areas of Special Interest:
Active and temporal data management, data mining and models, database programming languages, databases on the WWW, distributed data management, engineering, federated multi-database and mobile management, query processing & optimization, rapid application development tools, spatial data management, user interfaces.