Christian Cabrera, Andrei Paleyes, Pierre Thodoroff, Neil Lawrence
{"title":"Machine Learning Systems: A Survey from a Data-Oriented Perspective","authors":"Christian Cabrera, Andrei Paleyes, Pierre Thodoroff, Neil Lawrence","doi":"10.1145/3769292","DOIUrl":null,"url":null,"abstract":"Engineers are deploying ML models as parts of real-world systems with the upsurge of AI technologies. Real-world environments challenge the deployment of such systems because these environments produce large amounts of heterogeneous data, and users require increasingly efficient responses. These requirements push prevalent software architectures to the limit when deploying ML-based systems. Data-Oriented Architecture (DOA) is an emerging style that better equips systems to integrate ML models. Even though papers on deployed ML-based systems do not mention DOA, their authors make design decisions that implicitly follow DOA. Implicit decisions create a knowledge gap, limiting practitioners’ ability to implement ML-based systems. This paper surveys why, how, and to what extent practitioners have adopted DOA to implement ML-based systems. We overcome the knowledge gap by answering these questions and explicitly showing the design decisions and practices behind these systems. The survey follows a well-known systematic and semi-automated methodology for reviewing papers in software engineering. The majority of reviewed works partially adopt DOA. Such an adoption enables systems to address big data management, low-latency processing, resource management, security, and privacy requirements. Based on these findings, we formulate practical advice to facilitate the deployment of ML-based systems.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"28 1","pages":""},"PeriodicalIF":28.0000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3769292","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Engineers are deploying ML models as parts of real-world systems with the upsurge of AI technologies. Real-world environments challenge the deployment of such systems because these environments produce large amounts of heterogeneous data, and users require increasingly efficient responses. These requirements push prevalent software architectures to the limit when deploying ML-based systems. Data-Oriented Architecture (DOA) is an emerging style that better equips systems to integrate ML models. Even though papers on deployed ML-based systems do not mention DOA, their authors make design decisions that implicitly follow DOA. Implicit decisions create a knowledge gap, limiting practitioners’ ability to implement ML-based systems. This paper surveys why, how, and to what extent practitioners have adopted DOA to implement ML-based systems. We overcome the knowledge gap by answering these questions and explicitly showing the design decisions and practices behind these systems. The survey follows a well-known systematic and semi-automated methodology for reviewing papers in software engineering. The majority of reviewed works partially adopt DOA. Such an adoption enables systems to address big data management, low-latency processing, resource management, security, and privacy requirements. Based on these findings, we formulate practical advice to facilitate the deployment of ML-based systems.
期刊介绍:
ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods.
ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.