Dynamic Model Evaluation to Accelerate Distributed Machine Learning

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI:10.1109/BigDataCongress.2018.00027

Simon Caton, S. Venugopal, TN ShashiBhushan, Vidya Sankar Velamuri, K. Katrinis

{"title":"Dynamic Model Evaluation to Accelerate Distributed Machine Learning","authors":"Simon Caton, S. Venugopal, TN ShashiBhushan, Vidya Sankar Velamuri, K. Katrinis","doi":"10.1109/BigDataCongress.2018.00027","DOIUrl":null,"url":null,"abstract":"The increase in the volume and variety of data has increased the reliance of data scientists on shared computational resources, either in-house or obtained via cloud providers, to execute machine learning and artificial intelligence programs. This, in turn, has created challenges of exploiting available resources to execute such \"cognitive workloads\" quickly and effectively to gather the needed knowledge and data insight. A common challenge in machine learning is knowing when to stop model building. This is often exacerbated in the presence of big data as a trade off between the cost of producing the model (time, volume of training data, resources utilised) and its general performance. Whilst there are many tools and application stacks available to train models over distributed resources, the challenge of knowing when a model is \"good enough\" or no longer worth pursuing persists. In this paper, we propose a framework for the evaluating the models produced by distributed machine learning algorithms during the training process. This framework integrates with the cluster job scheduler so as to finalise model training under constraints of resource availability or time, or simply because model performance is asymptotic with further training. We present a prototype implementation of this framework using Apache Spark and YARN, and demonstrate the benefits of this approach using sample applications with both supervised and unsupervised learning algorithms.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2018.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The increase in the volume and variety of data has increased the reliance of data scientists on shared computational resources, either in-house or obtained via cloud providers, to execute machine learning and artificial intelligence programs. This, in turn, has created challenges of exploiting available resources to execute such "cognitive workloads" quickly and effectively to gather the needed knowledge and data insight. A common challenge in machine learning is knowing when to stop model building. This is often exacerbated in the presence of big data as a trade off between the cost of producing the model (time, volume of training data, resources utilised) and its general performance. Whilst there are many tools and application stacks available to train models over distributed resources, the challenge of knowing when a model is "good enough" or no longer worth pursuing persists. In this paper, we propose a framework for the evaluating the models produced by distributed machine learning algorithms during the training process. This framework integrates with the cluster job scheduler so as to finalise model training under constraints of resource availability or time, or simply because model performance is asymptotic with further training. We present a prototype implementation of this framework using Apache Spark and YARN, and demonstrate the benefits of this approach using sample applications with both supervised and unsupervised learning algorithms.

查看原文本刊更多论文

动态模型评估加速分布式机器学习

数据量和种类的增加增加了数据科学家对共享计算资源的依赖，无论是内部还是通过云提供商获得，以执行机器学习和人工智能程序。反过来，这又带来了利用可用资源来快速有效地执行此类“认知工作负载”以收集所需知识和数据洞察力的挑战。机器学习的一个常见挑战是知道何时停止模型构建。在大数据的存在下，这种情况往往会加剧，因为在生成模型的成本(时间、训练数据量、使用的资源)和总体性能之间进行权衡。虽然有许多工具和应用程序堆栈可用于在分布式资源上训练模型，但知道模型何时“足够好”或不再值得追求的挑战仍然存在。在本文中，我们提出了一个框架，用于评估分布式机器学习算法在训练过程中产生的模型。该框架与集群作业调度器集成，以便在资源可用性或时间的约束下完成模型训练，或者仅仅因为模型性能随着进一步训练渐近。我们使用Apache Spark和YARN给出了这个框架的原型实现，并使用带有监督和无监督学习算法的示例应用程序演示了这种方法的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Congress on Big Data (BigData Congress)

自引率

0.00%

发文量