Dynamic data management for continuous retraining

Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings Pub Date : 2022-10-23 DOI:10.1145/3550356.3561568

Nils Baumann, Evgeny Kusmenko, Jonas Ritz, Bernhard Rumpe, Moritz Benedikt Weber

{"title":"Dynamic data management for continuous retraining","authors":"Nils Baumann, Evgeny Kusmenko, Jonas Ritz, Bernhard Rumpe, Moritz Benedikt Weber","doi":"10.1145/3550356.3561568","DOIUrl":null,"url":null,"abstract":"Managing dynamic datasets intended to serve as training data for a Machine Learning (ML) model often emerges as very challenging, especially when data is often altered iteratively and already existing ML models should pertain to the data. For example, this applies when new data versions arise from either a generated or aggregated extension of an existing dataset a model has already been trained on. In this work, it is investigated on how a model-based approach for these training data concerns can be provided as well as how the complete process, including the resulting training and retraining process of the ML model, can therein be integrated. Hence, model-based concepts and the implementation are devised to cope with the complexity of iterative data management as an enabler for the integration of continuous retraining routines. With Deep Learning techniques becoming technically feasible and massively being developed further over the last decade, MLOps, aiming to establish DevOps tailored to ML projects, gained crucial relevance. Unfortunately, data-management concepts for iteratively growing datasets with retraining capabilities embedded in a model-driven ML development methodology are unexplored to the best of our knowledge. To fill in this gap, this contribution provides such agile data management concepts and integrates them and continuous retraining into the model-driven ML Framework MontiAnna [18]. The new functionality is evaluated in the context of a research project where ML is exploited for the optimal design of lattice structures for crash applications.","PeriodicalId":182662,"journal":{"name":"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3550356.3561568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Managing dynamic datasets intended to serve as training data for a Machine Learning (ML) model often emerges as very challenging, especially when data is often altered iteratively and already existing ML models should pertain to the data. For example, this applies when new data versions arise from either a generated or aggregated extension of an existing dataset a model has already been trained on. In this work, it is investigated on how a model-based approach for these training data concerns can be provided as well as how the complete process, including the resulting training and retraining process of the ML model, can therein be integrated. Hence, model-based concepts and the implementation are devised to cope with the complexity of iterative data management as an enabler for the integration of continuous retraining routines. With Deep Learning techniques becoming technically feasible and massively being developed further over the last decade, MLOps, aiming to establish DevOps tailored to ML projects, gained crucial relevance. Unfortunately, data-management concepts for iteratively growing datasets with retraining capabilities embedded in a model-driven ML development methodology are unexplored to the best of our knowledge. To fill in this gap, this contribution provides such agile data management concepts and integrates them and continuous retraining into the model-driven ML Framework MontiAnna [18]. The new functionality is evaluated in the context of a research project where ML is exploited for the optimal design of lattice structures for crash applications.

查看原文本刊更多论文

动态数据管理，持续再培训

管理旨在作为机器学习(ML)模型训练数据的动态数据集通常是非常具有挑战性的，特别是当数据经常被迭代更改并且已经存在的ML模型应该与数据相关时。例如，当新数据版本来自一个模型已经训练过的现有数据集的生成或聚合扩展时，这就适用了。在这项工作中，研究了如何为这些训练数据关注点提供基于模型的方法，以及如何将完整的过程(包括ML模型的最终训练和再训练过程)集成在一起。因此，基于模型的概念和实现被设计用来处理迭代数据管理的复杂性，作为持续再训练例程集成的推动者。随着深度学习技术在技术上变得可行，并且在过去十年中得到了进一步的大规模开发，旨在建立适合ML项目的DevOps的MLOps获得了至关重要的相关性。不幸的是，据我们所知，在模型驱动的ML开发方法中嵌入的具有再训练功能的迭代增长数据集的数据管理概念尚未得到探索。为了填补这一空白，该贡献提供了这样的敏捷数据管理概念，并将它们集成到模型驱动的ML框架MontiAnna[18]中。新功能是在一个研究项目的背景下进行评估的，在这个研究项目中，机器学习被用于崩溃应用程序的晶格结构的最佳设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings

自引率

0.00%

发文量