面向生产中的可信机器学习：MLOps方法鲁棒性概述

IF 23.8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

ACM Computing Surveys Pub Date : 2024-12-18 DOI:10.1145/3708497

Firas Bayram, Bestoun S. Ahmed

{"title":"面向生产中的可信机器学习：MLOps方法鲁棒性概述","authors":"Firas Bayram, Bestoun S. Ahmed","doi":"10.1145/3708497","DOIUrl":null,"url":null,"abstract":"Artificial intelligence (AI), and especially its sub-field of Machine Learning (ML), are impacting the daily lives of everyone with their ubiquitous applications. In recent years, AI researchers and practitioners have introduced principles and guidelines to build systems that make reliable and trustworthy decisions. From a practical perspective, conventional ML systems process historical data to extract the features that are consequently used to train ML models that perform the desired task. However, in practice, a fundamental challenge arises when the system needs to be operationalized and deployed to evolve and operate in real-life environments continuously. To address this challenge, Machine Learning Operations (MLOps) have emerged as a potential recipe for standardizing ML solutions in deployment. Although MLOps demonstrated great success in streamlining ML processes, thoroughly defining the specifications of robust MLOps approaches remains of great interest to researchers and practitioners. In this paper, we provide a comprehensive overview of the trustworthiness property of MLOps systems. Specifically, we highlight technical practices to achieve robust MLOps systems. In addition, we survey the existing research approaches that address the robustness aspects of ML systems in production. We also review the tools and software available to build MLOps systems and summarize their support to handle the robustness aspects. Finally, we present the open challenges and propose possible future directions and opportunities within this emerging field. The aim of this paper is to provide researchers and practitioners working on practical AI applications with a comprehensive view to adopt robust ML solutions in production environments.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"28 1","pages":""},"PeriodicalIF":23.8000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach\",\"authors\":\"Firas Bayram, Bestoun S. Ahmed\",\"doi\":\"10.1145/3708497\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial intelligence (AI), and especially its sub-field of Machine Learning (ML), are impacting the daily lives of everyone with their ubiquitous applications. In recent years, AI researchers and practitioners have introduced principles and guidelines to build systems that make reliable and trustworthy decisions. From a practical perspective, conventional ML systems process historical data to extract the features that are consequently used to train ML models that perform the desired task. However, in practice, a fundamental challenge arises when the system needs to be operationalized and deployed to evolve and operate in real-life environments continuously. To address this challenge, Machine Learning Operations (MLOps) have emerged as a potential recipe for standardizing ML solutions in deployment. Although MLOps demonstrated great success in streamlining ML processes, thoroughly defining the specifications of robust MLOps approaches remains of great interest to researchers and practitioners. In this paper, we provide a comprehensive overview of the trustworthiness property of MLOps systems. Specifically, we highlight technical practices to achieve robust MLOps systems. In addition, we survey the existing research approaches that address the robustness aspects of ML systems in production. We also review the tools and software available to build MLOps systems and summarize their support to handle the robustness aspects. Finally, we present the open challenges and propose possible future directions and opportunities within this emerging field. The aim of this paper is to provide researchers and practitioners working on practical AI applications with a comprehensive view to adopt robust ML solutions in production environments.\",\"PeriodicalId\":50926,\"journal\":{\"name\":\"ACM Computing Surveys\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":23.8000,\"publicationDate\":\"2024-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Computing Surveys\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3708497\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3708497","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

人工智能（AI），尤其是它的子领域机器学习（ML），正以其无处不在的应用影响着每个人的日常生活。近年来，人工智能研究人员和从业者已经引入了一些原则和指导方针，以构建做出可靠和值得信赖决策的系统。从实际的角度来看，传统的机器学习系统处理历史数据以提取特征，从而用于训练执行所需任务的机器学习模型。然而，在实践中，当系统需要在实际环境中不断发展和运行时，一个基本的挑战就出现了。为了应对这一挑战，机器学习操作（MLOps）已经成为部署中标准化机器学习解决方案的潜在配方。尽管MLOps在简化ML过程方面取得了巨大的成功，但彻底定义健壮的MLOps方法的规范仍然是研究人员和实践者非常感兴趣的问题。在本文中，我们对MLOps系统的可信性进行了全面的概述。具体来说，我们强调了实现健壮的MLOps系统的技术实践。此外，我们调查了现有的研究方法，以解决生产中的机器学习系统的鲁棒性方面。我们还回顾了可用于构建MLOps系统的工具和软件，并总结了它们对处理健壮性方面的支持。最后，我们提出了开放的挑战，并提出了在这个新兴领域可能的未来方向和机会。本文的目的是为从事实际人工智能应用的研究人员和实践者提供一个全面的视角，以便在生产环境中采用强大的机器学习解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach

Artificial intelligence (AI), and especially its sub-field of Machine Learning (ML), are impacting the daily lives of everyone with their ubiquitous applications. In recent years, AI researchers and practitioners have introduced principles and guidelines to build systems that make reliable and trustworthy decisions. From a practical perspective, conventional ML systems process historical data to extract the features that are consequently used to train ML models that perform the desired task. However, in practice, a fundamental challenge arises when the system needs to be operationalized and deployed to evolve and operate in real-life environments continuously. To address this challenge, Machine Learning Operations (MLOps) have emerged as a potential recipe for standardizing ML solutions in deployment. Although MLOps demonstrated great success in streamlining ML processes, thoroughly defining the specifications of robust MLOps approaches remains of great interest to researchers and practitioners. In this paper, we provide a comprehensive overview of the trustworthiness property of MLOps systems. Specifically, we highlight technical practices to achieve robust MLOps systems. In addition, we survey the existing research approaches that address the robustness aspects of ML systems in production. We also review the tools and software available to build MLOps systems and summarize their support to handle the robustness aspects. Finally, we present the open challenges and propose possible future directions and opportunities within this emerging field. The aim of this paper is to provide researchers and practitioners working on practical AI applications with a comprehensive view to adopt robust ML solutions in production environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Computing Surveys 工程技术-计算机：理论方法

CiteScore

33.20

自引率

0.60%

发文量

372

审稿时长

12 months

期刊介绍： ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods. ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.