Hunting Killer Tasks for Cloud System through Machine Learning: A Google Cluster Case Study

Hongyan Tang, Ying Li, Tong Jia, Zhonghai Wu
{"title":"Hunting Killer Tasks for Cloud System through Machine Learning: A Google Cluster Case Study","authors":"Hongyan Tang, Ying Li, Tong Jia, Zhonghai Wu","doi":"10.1109/QRS.2016.11","DOIUrl":null,"url":null,"abstract":"Motivated by frequent failures in cloud computing systems, we analyze failure frequency and failure continuity of tasks from the Google cloud cluster, and find what we call killer tasks that suffer from frequent failures and repeated rescheduling. Killer tasks cause unnecessary resource wasting and significant increase of scheduling workloads, which can be a big concern in cloud systems. We aim to recognize killer tasks at the very early stage of their occurrence so that they can be addressed proactively instead of being rescheduled repeatedly, so as to promote reliability and save resources. To recognize killer tasks from a large amount of tasks in real time is really challenging. In this paper, we first investigate characteristics and behavior patterns of killer tasks and then develop two machine learning based methods, K-HUNTER and C-HUNTER, for online recognition of killer tasks. The empirical results show that our approach performs at 97% of precision in recognizing killer tasks with an 89% timing advance and 88% of resource saving for the cloud system on average.","PeriodicalId":92210,"journal":{"name":"IEEE International Conference on Software Quality, Reliability and Security : proceedings. IEEE International Conference on Software Quality, Reliability and Security","volume":"20 1","pages":"1-12"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Software Quality, Reliability and Security : proceedings. IEEE International Conference on Software Quality, Reliability and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS.2016.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Motivated by frequent failures in cloud computing systems, we analyze failure frequency and failure continuity of tasks from the Google cloud cluster, and find what we call killer tasks that suffer from frequent failures and repeated rescheduling. Killer tasks cause unnecessary resource wasting and significant increase of scheduling workloads, which can be a big concern in cloud systems. We aim to recognize killer tasks at the very early stage of their occurrence so that they can be addressed proactively instead of being rescheduled repeatedly, so as to promote reliability and save resources. To recognize killer tasks from a large amount of tasks in real time is really challenging. In this paper, we first investigate characteristics and behavior patterns of killer tasks and then develop two machine learning based methods, K-HUNTER and C-HUNTER, for online recognition of killer tasks. The empirical results show that our approach performs at 97% of precision in recognizing killer tasks with an 89% timing advance and 88% of resource saving for the cloud system on average.
通过机器学习为云系统寻找杀手级任务:一个Google集群案例研究
在云计算系统频繁故障的激励下,我们分析了Google云集群中任务的故障频率和故障连续性,并发现了我们所谓的杀手级任务,这些任务遭受频繁故障和反复重新调度。杀手级任务会导致不必要的资源浪费和调度工作负载的显著增加,这在云系统中可能是一个大问题。我们的目标是在杀手级任务发生的最早期就发现它们,从而主动解决它们,而不是重复地重新调度,从而提高可靠性并节省资源。从大量的任务中实时识别出杀手级任务是非常具有挑战性的。在本文中,我们首先研究了杀手任务的特征和行为模式,然后开发了两种基于机器学习的方法,K-HUNTER和C-HUNTER,用于在线识别杀手任务。实证结果表明,我们的方法在识别杀手级任务方面的准确率为97%,时间提前89%,云系统平均节省资源88%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信