Training sample selection for deep learning of distributed data

Zheng Jiang, Xiaoqing Zhu, Wai-tian Tan, Rob Liston
{"title":"Training sample selection for deep learning of distributed data","authors":"Zheng Jiang, Xiaoqing Zhu, Wai-tian Tan, Rob Liston","doi":"10.1109/ICIP.2017.8296670","DOIUrl":null,"url":null,"abstract":"The success of deep learning — in the form of multi-layer neural networks — depends critically on the volume and variety of training data. Its potential is greatly compromised when training data originate in a geographically distributed manner and are subject to bandwidth constraints. This paper presents a data sampling approach to deep learning, by carefully discriminating locally available training samples based on their relative importance. Towards this end, we propose two metrics for prioritizing candidate training samples as functions of their test trial outcome: correctness and confidence. Bandwidth-constrained simulations show significant performance gain of our proposed training sample selection schemes over convention uniform sampling: up to 15× bandwidth reduction for the MNIST dataset and 25% reduction in learning time for the CIFAR-10 dataset.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP.2017.8296670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The success of deep learning — in the form of multi-layer neural networks — depends critically on the volume and variety of training data. Its potential is greatly compromised when training data originate in a geographically distributed manner and are subject to bandwidth constraints. This paper presents a data sampling approach to deep learning, by carefully discriminating locally available training samples based on their relative importance. Towards this end, we propose two metrics for prioritizing candidate training samples as functions of their test trial outcome: correctness and confidence. Bandwidth-constrained simulations show significant performance gain of our proposed training sample selection schemes over convention uniform sampling: up to 15× bandwidth reduction for the MNIST dataset and 25% reduction in learning time for the CIFAR-10 dataset.
分布式数据深度学习的训练样本选择
深度学习的成功——以多层神经网络的形式——关键取决于训练数据的数量和种类。当训练数据以地理分布的方式产生并受到带宽限制时,它的潜力就会大打折扣。本文提出了一种深度学习的数据采样方法,通过根据其相对重要性仔细区分局部可用的训练样本。为此,我们提出了两个指标来优先考虑候选训练样本,作为其测试试验结果的函数:正确性和置信度。带宽约束的模拟表明,我们提出的训练样本选择方案比传统的均匀采样有显著的性能提升:MNIST数据集的带宽减少了15倍,CIFAR-10数据集的学习时间减少了25%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信