Training sample selection for deep learning of distributed data

2017 IEEE International Conference on Image Processing (ICIP) Pub Date : 2017-09-15 DOI:10.1109/ICIP.2017.8296670

Zheng Jiang, Xiaoqing Zhu, Wai-tian Tan, Rob Liston

引用次数: 5

Abstract

The success of deep learning — in the form of multi-layer neural networks — depends critically on the volume and variety of training data. Its potential is greatly compromised when training data originate in a geographically distributed manner and are subject to bandwidth constraints. This paper presents a data sampling approach to deep learning, by carefully discriminating locally available training samples based on their relative importance. Towards this end, we propose two metrics for prioritizing candidate training samples as functions of their test trial outcome: correctness and confidence. Bandwidth-constrained simulations show significant performance gain of our proposed training sample selection schemes over convention uniform sampling: up to 15× bandwidth reduction for the MNIST dataset and 25% reduction in learning time for the CIFAR-10 dataset.

查看原文本刊更多论文

分布式数据深度学习的训练样本选择

深度学习的成功——以多层神经网络的形式——关键取决于训练数据的数量和种类。当训练数据以地理分布的方式产生并受到带宽限制时，它的潜力就会大打折扣。本文提出了一种深度学习的数据采样方法，通过根据其相对重要性仔细区分局部可用的训练样本。为此，我们提出了两个指标来优先考虑候选训练样本，作为其测试试验结果的函数:正确性和置信度。带宽约束的模拟表明，我们提出的训练样本选择方案比传统的均匀采样有显著的性能提升:MNIST数据集的带宽减少了15倍，CIFAR-10数据集的学习时间减少了25%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Image Processing (ICIP)

自引率

0.00%

发文量