Dynamic Model Evaluation to Accelerate Distributed Machine Learning

Simon Caton, S. Venugopal, TN ShashiBhushan, Vidya Sankar Velamuri, K. Katrinis
{"title":"Dynamic Model Evaluation to Accelerate Distributed Machine Learning","authors":"Simon Caton, S. Venugopal, TN ShashiBhushan, Vidya Sankar Velamuri, K. Katrinis","doi":"10.1109/BigDataCongress.2018.00027","DOIUrl":null,"url":null,"abstract":"The increase in the volume and variety of data has increased the reliance of data scientists on shared computational resources, either in-house or obtained via cloud providers, to execute machine learning and artificial intelligence programs. This, in turn, has created challenges of exploiting available resources to execute such \"cognitive workloads\" quickly and effectively to gather the needed knowledge and data insight. A common challenge in machine learning is knowing when to stop model building. This is often exacerbated in the presence of big data as a trade off between the cost of producing the model (time, volume of training data, resources utilised) and its general performance. Whilst there are many tools and application stacks available to train models over distributed resources, the challenge of knowing when a model is \"good enough\" or no longer worth pursuing persists. In this paper, we propose a framework for the evaluating the models produced by distributed machine learning algorithms during the training process. This framework integrates with the cluster job scheduler so as to finalise model training under constraints of resource availability or time, or simply because model performance is asymptotic with further training. We present a prototype implementation of this framework using Apache Spark and YARN, and demonstrate the benefits of this approach using sample applications with both supervised and unsupervised learning algorithms.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2018.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The increase in the volume and variety of data has increased the reliance of data scientists on shared computational resources, either in-house or obtained via cloud providers, to execute machine learning and artificial intelligence programs. This, in turn, has created challenges of exploiting available resources to execute such "cognitive workloads" quickly and effectively to gather the needed knowledge and data insight. A common challenge in machine learning is knowing when to stop model building. This is often exacerbated in the presence of big data as a trade off between the cost of producing the model (time, volume of training data, resources utilised) and its general performance. Whilst there are many tools and application stacks available to train models over distributed resources, the challenge of knowing when a model is "good enough" or no longer worth pursuing persists. In this paper, we propose a framework for the evaluating the models produced by distributed machine learning algorithms during the training process. This framework integrates with the cluster job scheduler so as to finalise model training under constraints of resource availability or time, or simply because model performance is asymptotic with further training. We present a prototype implementation of this framework using Apache Spark and YARN, and demonstrate the benefits of this approach using sample applications with both supervised and unsupervised learning algorithms.
动态模型评估加速分布式机器学习
数据量和种类的增加增加了数据科学家对共享计算资源的依赖,无论是内部还是通过云提供商获得,以执行机器学习和人工智能程序。反过来,这又带来了利用可用资源来快速有效地执行此类“认知工作负载”以收集所需知识和数据洞察力的挑战。机器学习的一个常见挑战是知道何时停止模型构建。在大数据的存在下,这种情况往往会加剧,因为在生成模型的成本(时间、训练数据量、使用的资源)和总体性能之间进行权衡。虽然有许多工具和应用程序堆栈可用于在分布式资源上训练模型,但知道模型何时“足够好”或不再值得追求的挑战仍然存在。在本文中,我们提出了一个框架,用于评估分布式机器学习算法在训练过程中产生的模型。该框架与集群作业调度器集成,以便在资源可用性或时间的约束下完成模型训练,或者仅仅因为模型性能随着进一步训练渐近。我们使用Apache Spark和YARN给出了这个框架的原型实现,并使用带有监督和无监督学习算法的示例应用程序演示了这种方法的好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信