Dynamic data management for continuous retraining

Nils Baumann, Evgeny Kusmenko, Jonas Ritz, Bernhard Rumpe, Moritz Benedikt Weber
{"title":"Dynamic data management for continuous retraining","authors":"Nils Baumann, Evgeny Kusmenko, Jonas Ritz, Bernhard Rumpe, Moritz Benedikt Weber","doi":"10.1145/3550356.3561568","DOIUrl":null,"url":null,"abstract":"Managing dynamic datasets intended to serve as training data for a Machine Learning (ML) model often emerges as very challenging, especially when data is often altered iteratively and already existing ML models should pertain to the data. For example, this applies when new data versions arise from either a generated or aggregated extension of an existing dataset a model has already been trained on. In this work, it is investigated on how a model-based approach for these training data concerns can be provided as well as how the complete process, including the resulting training and retraining process of the ML model, can therein be integrated. Hence, model-based concepts and the implementation are devised to cope with the complexity of iterative data management as an enabler for the integration of continuous retraining routines. With Deep Learning techniques becoming technically feasible and massively being developed further over the last decade, MLOps, aiming to establish DevOps tailored to ML projects, gained crucial relevance. Unfortunately, data-management concepts for iteratively growing datasets with retraining capabilities embedded in a model-driven ML development methodology are unexplored to the best of our knowledge. To fill in this gap, this contribution provides such agile data management concepts and integrates them and continuous retraining into the model-driven ML Framework MontiAnna [18]. The new functionality is evaluated in the context of a research project where ML is exploited for the optimal design of lattice structures for crash applications.","PeriodicalId":182662,"journal":{"name":"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3550356.3561568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Managing dynamic datasets intended to serve as training data for a Machine Learning (ML) model often emerges as very challenging, especially when data is often altered iteratively and already existing ML models should pertain to the data. For example, this applies when new data versions arise from either a generated or aggregated extension of an existing dataset a model has already been trained on. In this work, it is investigated on how a model-based approach for these training data concerns can be provided as well as how the complete process, including the resulting training and retraining process of the ML model, can therein be integrated. Hence, model-based concepts and the implementation are devised to cope with the complexity of iterative data management as an enabler for the integration of continuous retraining routines. With Deep Learning techniques becoming technically feasible and massively being developed further over the last decade, MLOps, aiming to establish DevOps tailored to ML projects, gained crucial relevance. Unfortunately, data-management concepts for iteratively growing datasets with retraining capabilities embedded in a model-driven ML development methodology are unexplored to the best of our knowledge. To fill in this gap, this contribution provides such agile data management concepts and integrates them and continuous retraining into the model-driven ML Framework MontiAnna [18]. The new functionality is evaluated in the context of a research project where ML is exploited for the optimal design of lattice structures for crash applications.
动态数据管理,持续再培训
管理旨在作为机器学习(ML)模型训练数据的动态数据集通常是非常具有挑战性的,特别是当数据经常被迭代更改并且已经存在的ML模型应该与数据相关时。例如,当新数据版本来自一个模型已经训练过的现有数据集的生成或聚合扩展时,这就适用了。在这项工作中,研究了如何为这些训练数据关注点提供基于模型的方法,以及如何将完整的过程(包括ML模型的最终训练和再训练过程)集成在一起。因此,基于模型的概念和实现被设计用来处理迭代数据管理的复杂性,作为持续再训练例程集成的推动者。随着深度学习技术在技术上变得可行,并且在过去十年中得到了进一步的大规模开发,旨在建立适合ML项目的DevOps的MLOps获得了至关重要的相关性。不幸的是,据我们所知,在模型驱动的ML开发方法中嵌入的具有再训练功能的迭代增长数据集的数据管理概念尚未得到探索。为了填补这一空白,该贡献提供了这样的敏捷数据管理概念,并将它们集成到模型驱动的ML框架MontiAnna[18]中。新功能是在一个研究项目的背景下进行评估的,在这个研究项目中,机器学习被用于崩溃应用程序的晶格结构的最佳设计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信