Capturing Provenance from Deep Learning Applications Using Keras-Prov and Colab: a Practical Approach

Journal of Information and Data Management Pub Date : 2022-12-19 DOI:10.5753/jidm.2022.2544

Débora Pina, L. Kunstmann, Felipe Bevilaqua, Isabela Siqueira, Alan Lyra, Daniel de Oliveira, M. Mattoso

{"title":"Capturing Provenance from Deep Learning Applications Using Keras-Prov and Colab: a Practical Approach","authors":"Débora Pina, L. Kunstmann, Felipe Bevilaqua, Isabela Siqueira, Alan Lyra, Daniel de Oliveira, M. Mattoso","doi":"10.5753/jidm.2022.2544","DOIUrl":null,"url":null,"abstract":"Due to the exploratory nature of DNNs, DL specialists often need to modify the input dataset, change a filter when preprocessing input data, or fine-tune the models’ hyperparameters, while analyzing the evolution of the training. However, the specialist may lose track of what hyperparameter configurations have been used and tuned if these data are not properly registered. Thus, these configurations must be tracked and made available for the user’s analysis. One way of doing this is to use provenance data derivation traces to help the hyperparameter’s fine-tuning by providing a global data picture with clear dependencies. Current provenance solutions present provenance data disconnected from W3C PROV recommendation, which is difficult to reproduce and compare to other provenance data. To help with these challenges, we present Keras-Prov, an extension to the Keras deep learning library to collect provenance data compliant with PROV. To show the flexibility of Keras-Prov, we extend a previous Keras-Prov demonstration paper with larger experiments using GPUs with the help of Google Colab. Despite the challenges of running a DBMS with virtual environments, DL analysis with provenance has added trust and persistence in databases and PROV serializations. Experiments show Keras-Prov data analysis, during training execution, to support hyperparameter fine-tuning decisions, favoring the comparison, and reproducibility of such DL experiments. Keras-Prov is open source and can be downloaded from https://github.com/dbpina/keras-prov.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information and Data Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/jidm.2022.2544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Due to the exploratory nature of DNNs, DL specialists often need to modify the input dataset, change a filter when preprocessing input data, or fine-tune the models’ hyperparameters, while analyzing the evolution of the training. However, the specialist may lose track of what hyperparameter configurations have been used and tuned if these data are not properly registered. Thus, these configurations must be tracked and made available for the user’s analysis. One way of doing this is to use provenance data derivation traces to help the hyperparameter’s fine-tuning by providing a global data picture with clear dependencies. Current provenance solutions present provenance data disconnected from W3C PROV recommendation, which is difficult to reproduce and compare to other provenance data. To help with these challenges, we present Keras-Prov, an extension to the Keras deep learning library to collect provenance data compliant with PROV. To show the flexibility of Keras-Prov, we extend a previous Keras-Prov demonstration paper with larger experiments using GPUs with the help of Google Colab. Despite the challenges of running a DBMS with virtual environments, DL analysis with provenance has added trust and persistence in databases and PROV serializations. Experiments show Keras-Prov data analysis, during training execution, to support hyperparameter fine-tuning decisions, favoring the comparison, and reproducibility of such DL experiments. Keras-Prov is open source and can be downloaded from https://github.com/dbpina/keras-prov.

查看原文本刊更多论文

使用keras - prove和Colab从深度学习应用程序捕获来源:一种实用的方法

由于深度神经网络的探索性，深度学习专家经常需要修改输入数据集，在预处理输入数据时更改过滤器，或者在分析训练演变的同时微调模型的超参数。但是，如果这些数据没有正确注册，专家可能会失去使用和调优的超参数配置的跟踪。因此，必须跟踪这些配置，并使其可用于用户分析。这样做的一种方法是使用来源数据派生跟踪，通过提供具有明确依赖关系的全局数据图来帮助超参数的微调。目前的来源解决方案所提供的来源数据与W3C PROV推荐的来源数据是分离的，这很难再现并与其他来源数据进行比较。为了帮助解决这些挑战，我们提出了Keras- proof，这是Keras深度学习库的扩展，用于收集符合PROV的来源数据。为了展示keras - prove的灵活性，我们在Google Colab的帮助下扩展了以前的keras - prove演示论文，并使用gpu进行了更大的实验。尽管在虚拟环境中运行DBMS存在挑战，但具有来源的DL分析增加了数据库和PROV序列化中的信任和持久性。实验表明，在训练执行过程中，Keras-Prov数据分析支持超参数微调决策，有利于这种深度学习实验的比较和可重复性。Keras-Prov是开源的，可以从https://github.com/dbpina/keras-prov下载。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information and Data Management

自引率

0.00%

发文量