Zheng Jiang, Xiaoqing Zhu, Wai-tian Tan, Rob Liston
{"title":"分布式数据深度学习的训练样本选择","authors":"Zheng Jiang, Xiaoqing Zhu, Wai-tian Tan, Rob Liston","doi":"10.1109/ICIP.2017.8296670","DOIUrl":null,"url":null,"abstract":"The success of deep learning — in the form of multi-layer neural networks — depends critically on the volume and variety of training data. Its potential is greatly compromised when training data originate in a geographically distributed manner and are subject to bandwidth constraints. This paper presents a data sampling approach to deep learning, by carefully discriminating locally available training samples based on their relative importance. Towards this end, we propose two metrics for prioritizing candidate training samples as functions of their test trial outcome: correctness and confidence. Bandwidth-constrained simulations show significant performance gain of our proposed training sample selection schemes over convention uniform sampling: up to 15× bandwidth reduction for the MNIST dataset and 25% reduction in learning time for the CIFAR-10 dataset.","PeriodicalId":229602,"journal":{"name":"2017 IEEE International Conference on Image Processing (ICIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Training sample selection for deep learning of distributed data\",\"authors\":\"Zheng Jiang, Xiaoqing Zhu, Wai-tian Tan, Rob Liston\",\"doi\":\"10.1109/ICIP.2017.8296670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The success of deep learning — in the form of multi-layer neural networks — depends critically on the volume and variety of training data. Its potential is greatly compromised when training data originate in a geographically distributed manner and are subject to bandwidth constraints. This paper presents a data sampling approach to deep learning, by carefully discriminating locally available training samples based on their relative importance. Towards this end, we propose two metrics for prioritizing candidate training samples as functions of their test trial outcome: correctness and confidence. Bandwidth-constrained simulations show significant performance gain of our proposed training sample selection schemes over convention uniform sampling: up to 15× bandwidth reduction for the MNIST dataset and 25% reduction in learning time for the CIFAR-10 dataset.\",\"PeriodicalId\":229602,\"journal\":{\"name\":\"2017 IEEE International Conference on Image Processing (ICIP)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Image Processing (ICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIP.2017.8296670\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP.2017.8296670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Training sample selection for deep learning of distributed data
The success of deep learning — in the form of multi-layer neural networks — depends critically on the volume and variety of training data. Its potential is greatly compromised when training data originate in a geographically distributed manner and are subject to bandwidth constraints. This paper presents a data sampling approach to deep learning, by carefully discriminating locally available training samples based on their relative importance. Towards this end, we propose two metrics for prioritizing candidate training samples as functions of their test trial outcome: correctness and confidence. Bandwidth-constrained simulations show significant performance gain of our proposed training sample selection schemes over convention uniform sampling: up to 15× bandwidth reduction for the MNIST dataset and 25% reduction in learning time for the CIFAR-10 dataset.