Concept drift detection for distributed multi-model machine learning systems

2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC) Pub Date : 2022-06-01 DOI:10.1109/COMPSAC54236.2022.00168

Beverly Abadines Quon, J. Gaudiot

{"title":"Concept drift detection for distributed multi-model machine learning systems","authors":"Beverly Abadines Quon, J. Gaudiot","doi":"10.1109/COMPSAC54236.2022.00168","DOIUrl":null,"url":null,"abstract":"Many works focus on optimizing machine learning models during their training phase, but fail to account how these models adapt into their model-serving phase once they are deployed into real world applications. In this phase models must process through streams of data that can evolve over time and distort the relationship between incoming data, causing concept drift. This paper proposes leveraging the advantages of emerging features stores in order to improve concept drift detection on unlabeled, dynamic data streams across multiple models. Firstly, we introduce Drift Detection on Distributed Datasets (QuaD), which combines classical drift detectors to make use of labeled and unlabeled data, and create local context (i.e. per live model) and global context (i.e. across multiple models). Secondly, we propose using feature store entities, SHAP values, and Collaborative Filtering (CF) to augment unlabeled data across multiple models. To the best of our knowledge, QuaD is the first work that examines the collective behavior of concept drift across multiple models and discerns associations between models that may share a susceptibility in a dynamic setting. QuaD uses a combination of performance-based and data distribution-based drift detectors and CF to capture varying types of concept drifts for labeled and unlabeled data streams and is modeled around the data abstraction provided by emerging feature stores.","PeriodicalId":330838,"journal":{"name":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC54236.2022.00168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Many works focus on optimizing machine learning models during their training phase, but fail to account how these models adapt into their model-serving phase once they are deployed into real world applications. In this phase models must process through streams of data that can evolve over time and distort the relationship between incoming data, causing concept drift. This paper proposes leveraging the advantages of emerging features stores in order to improve concept drift detection on unlabeled, dynamic data streams across multiple models. Firstly, we introduce Drift Detection on Distributed Datasets (QuaD), which combines classical drift detectors to make use of labeled and unlabeled data, and create local context (i.e. per live model) and global context (i.e. across multiple models). Secondly, we propose using feature store entities, SHAP values, and Collaborative Filtering (CF) to augment unlabeled data across multiple models. To the best of our knowledge, QuaD is the first work that examines the collective behavior of concept drift across multiple models and discerns associations between models that may share a susceptibility in a dynamic setting. QuaD uses a combination of performance-based and data distribution-based drift detectors and CF to capture varying types of concept drifts for labeled and unlabeled data streams and is modeled around the data abstraction provided by emerging feature stores.

查看原文本刊更多论文

分布式多模型机器学习系统的概念漂移检测

许多工作专注于在训练阶段优化机器学习模型，但没有考虑这些模型一旦部署到现实世界的应用程序中，如何适应它们的模型服务阶段。在这个阶段，模型必须处理数据流，这些数据流可能随着时间的推移而演变，并扭曲传入数据之间的关系，从而导致概念漂移。本文提出利用新兴特征存储的优势，以改进跨多个模型的未标记动态数据流的概念漂移检测。首先，我们介绍了分布式数据集上的漂移检测(QuaD)，它结合了经典的漂移检测器来利用标记和未标记的数据，并创建本地上下文(即每个实时模型)和全局上下文(即跨多个模型)。其次，我们建议使用特征存储实体、SHAP值和协同过滤(CF)来增加跨多个模型的未标记数据。据我们所知，QuaD是第一个研究跨多个模型的概念漂移的集体行为，并辨别在动态环境中可能共享敏感性的模型之间的关联的工作。QuaD结合使用基于性能和基于数据分布的漂移检测器和CF来捕获标记和未标记数据流的不同类型的概念漂移，并围绕新兴特征存储提供的数据抽象进行建模。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)

自引率

0.00%

发文量