Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit

IF 1.3 4区计算机科学 Q1 Computer Science

IBM Journal of Research and Development Pub Date : 2019-12-17 DOI:10.1147/JRD.2019.2960225

M. Coletti;A. Fafard;D. Page

{"title":"Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit","authors":"M. Coletti;A. Fafard;D. Page","doi":"10.1147/JRD.2019.2960225","DOIUrl":null,"url":null,"abstract":"Architectural and hyperparameter design choices can influence deep-learner (DL) model fidelity but can also be affected by malformed training and validation data. However, practitioners may spend significant time refining layers and hyperparameters before discovering that distorted training data were impeding the training progress. We found that an evolutionary algorithm (EA) can be used to troubleshoot this kind of DL problem. An EA evaluated thousands of DL configurations on Summit that yielded no overall improvement in DL performance, which suggested problems with the training and validation data. We suspected that contrast limited adaptive histogram equalization enhancement that was applied to previously generated digital surface models, for which we were training DLs to find errors, had damaged the training data. Subsequent runs with an alternative global normalization yielded significantly improved DL performance. However, the DL intersection over unions still exhibited consistent subpar performance, which suggested further problems with the training data and DL approach. Nonetheless, we were able to diagnose this problem within a 12-hour span via Summit runs, which prevented several weeks of unproductive trial-and-error DL configuration refinement and allowed for a more timely convergence on an ultimately viable solution.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"1-12"},"PeriodicalIF":1.3000,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960225","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IBM Journal of Research and Development","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/8935167/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 6

Abstract

Architectural and hyperparameter design choices can influence deep-learner (DL) model fidelity but can also be affected by malformed training and validation data. However, practitioners may spend significant time refining layers and hyperparameters before discovering that distorted training data were impeding the training progress. We found that an evolutionary algorithm (EA) can be used to troubleshoot this kind of DL problem. An EA evaluated thousands of DL configurations on Summit that yielded no overall improvement in DL performance, which suggested problems with the training and validation data. We suspected that contrast limited adaptive histogram equalization enhancement that was applied to previously generated digital surface models, for which we were training DLs to find errors, had damaged the training data. Subsequent runs with an alternative global normalization yielded significantly improved DL performance. However, the DL intersection over unions still exhibited consistent subpar performance, which suggested further problems with the training data and DL approach. Nonetheless, we were able to diagnose this problem within a 12-hour span via Summit runs, which prevented several weeks of unproductive trial-and-error DL configuration refinement and allowed for a more timely convergence on an ultimately viable solution.

查看原文本刊更多论文

在Summit上使用进化算法解决深度学习者训练数据问题

架构和超参数设计选择可能会影响深度学习器（DL）模型的保真度，但也可能受到格式错误的训练和验证数据的影响。然而，在发现扭曲的训练数据阻碍训练进度之前，从业者可能会花费大量时间来细化层和超参数。我们发现，进化算法（EA）可以用来解决这类DL问题。EA在Summit上评估了数千个DL配置，但这些配置并没有使DL性能得到全面改善，这表明训练和验证数据存在问题。我们怀疑，应用于先前生成的数字表面模型的对比度有限的自适应直方图均衡增强已经损坏了训练数据，我们正在训练DLs以查找错误。随后使用替代全局规范化的运行显著提高了DL性能。然而，并集上的DL交集仍然表现出一致的亚性能，这表明训练数据和DL方法存在进一步的问题。尽管如此，我们还是能够通过Summit运行在12小时内诊断出这个问题，这避免了数周的无效试错DL配置优化，并使我们能够更及时地达成最终可行的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IBM Journal of Research and Development 工程技术-计算机：硬件

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The IBM Journal of Research and Development is a peer-reviewed technical journal, published bimonthly, which features the work of authors in the science, technology and engineering of information systems. Papers are written for the worldwide scientific research and development community and knowledgeable professionals. Submitted papers are welcome from the IBM technical community and from non-IBM authors on topics relevant to the scientific and technical content of the Journal.