Process-driven design of cloud data platforms

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2025-02-04 DOI:10.1016/j.is.2025.102527

Matteo Francia, Matteo Golfarelli, Manuele Pasini

{"title":"Process-driven design of cloud data platforms","authors":"Matteo Francia, Matteo Golfarelli, Manuele Pasini","doi":"10.1016/j.is.2025.102527","DOIUrl":null,"url":null,"abstract":"<div><div>Data platforms are state-of-the-art solutions for implementing data-driven applications and analytics. They facilitate the ingestion, storage, management, and exploitation of big data. Data platforms are built on top of complex ecosystems of services answering different data needs and requirements; such ecosystems are offered by different providers (e.g., Amazon AWS and Microsoft Azure). However, when it comes to engineering data platforms, no unifying strategy and methodology is available yet, and the design is mainly left to the expertise of practitioners in the field. Service providers simply expose a long list of interoperable and alternative engines, making it hard to select the optimal subset without a deep knowledge of the ecosystem. A more effective design approach starts with knowledge of the data transformation and exploitation processes that the platform should support. In this paper, we sketch a computer-aided design methodology and then focus on the selection of the optimal services needed to implement such processes. We show that our approach lightens the design of data platforms and enables an unbiased selection and comparison of solutions even through different service ecosystems.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"131 ","pages":"Article 102527"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000122","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Data platforms are state-of-the-art solutions for implementing data-driven applications and analytics. They facilitate the ingestion, storage, management, and exploitation of big data. Data platforms are built on top of complex ecosystems of services answering different data needs and requirements; such ecosystems are offered by different providers (e.g., Amazon AWS and Microsoft Azure). However, when it comes to engineering data platforms, no unifying strategy and methodology is available yet, and the design is mainly left to the expertise of practitioners in the field. Service providers simply expose a long list of interoperable and alternative engines, making it hard to select the optimal subset without a deep knowledge of the ecosystem. A more effective design approach starts with knowledge of the data transformation and exploitation processes that the platform should support. In this paper, we sketch a computer-aided design methodology and then focus on the selection of the optimal services needed to implement such processes. We show that our approach lightens the design of data platforms and enables an unbiased selection and comparison of solutions even through different service ecosystems.

查看原文本刊更多论文

云数据平台的流程驱动设计

数据平台是用于实现数据驱动应用程序和分析的最先进的解决方案。它们促进了大数据的摄取、存储、管理和利用。数据平台建立在满足不同数据需求的复杂服务生态系统之上；这样的生态系统由不同的供应商提供（例如，亚马逊AWS和微软Azure）。然而，当涉及到工程数据平台时，还没有统一的策略和方法可用，设计主要留给该领域从业者的专业知识。服务提供商只是暴露了一长串可互操作和可替代引擎，如果没有对生态系统的深入了解，就很难选择最佳子集。更有效的设计方法从了解平台应该支持的数据转换和开发过程开始。在本文中，我们概述了一种计算机辅助设计方法，然后重点关注实现这些过程所需的最佳服务的选择。我们表明，我们的方法减轻了数据平台的设计，甚至可以通过不同的服务生态系统对解决方案进行公正的选择和比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.