Towards Exploiting CPU Elasticity via Efficient Thread Oversubscription

Hang Huang, J. Rao, Song Wu, Hai Jin, Hong Jiang, Hao Che, Xiaofeng Wu
{"title":"Towards Exploiting CPU Elasticity via Efficient Thread Oversubscription","authors":"Hang Huang, J. Rao, Song Wu, Hai Jin, Hong Jiang, Hao Che, Xiaofeng Wu","doi":"10.1145/3431379.3460641","DOIUrl":null,"url":null,"abstract":"Elasticity is an essential feature of cloud computing, which allows users to dynamically add or remove resources in response to workload changes. However, building applications that truly exploit elasticity is non-trivial. Traditional applications need to be modified to efficiently utilize variable resources. This paper explores thread oversubscription, i.e., provisioning more threads than the available cores, to exploit CPU elasticity in the cloud. While maintaining sufficient concurrency allows applications to utilize additional CPUs when more are made available, it is widely believed that thread oversubscription introduces prohibitive overheads due to excessive context switches, loss of locality, and contention on shared resources. In this paper, we conduct a comprehensive study of the overhead of thread oversubscription. We find that 1) the direct cost of context switching (i.e., 1-2 μs on modern processors) does not cause noticeable performance slow down to most applications; 2) oversubscription can be both constructive and destructive to the performance of CPU caches and TLB. We identify two previously under-studied issues that are responsible for drastic slowdowns in many applications under oversubscription. First, the existing thread sleep and wakeup process in the OS kernel is inefficient in handling oversubscribed threads. Second, pervasive busy-waiting operations in program code can waste CPU and starve critical threads. To this end, we devise two OS mechanisms, virtual blocking and busy-waiting detection, to enable efficient thread oversubscription without requiring program code changes. Experimental results show that our approaches can achieve an efficiency close to that in under-subscribed scenarios while preserving the capability to expand to many more CPUs. The performance gain is up to 77% for blocking- and 19x for busy-waiting-based applications compared to the vanilla Linux.","PeriodicalId":343991,"journal":{"name":"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431379.3460641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Elasticity is an essential feature of cloud computing, which allows users to dynamically add or remove resources in response to workload changes. However, building applications that truly exploit elasticity is non-trivial. Traditional applications need to be modified to efficiently utilize variable resources. This paper explores thread oversubscription, i.e., provisioning more threads than the available cores, to exploit CPU elasticity in the cloud. While maintaining sufficient concurrency allows applications to utilize additional CPUs when more are made available, it is widely believed that thread oversubscription introduces prohibitive overheads due to excessive context switches, loss of locality, and contention on shared resources. In this paper, we conduct a comprehensive study of the overhead of thread oversubscription. We find that 1) the direct cost of context switching (i.e., 1-2 μs on modern processors) does not cause noticeable performance slow down to most applications; 2) oversubscription can be both constructive and destructive to the performance of CPU caches and TLB. We identify two previously under-studied issues that are responsible for drastic slowdowns in many applications under oversubscription. First, the existing thread sleep and wakeup process in the OS kernel is inefficient in handling oversubscribed threads. Second, pervasive busy-waiting operations in program code can waste CPU and starve critical threads. To this end, we devise two OS mechanisms, virtual blocking and busy-waiting detection, to enable efficient thread oversubscription without requiring program code changes. Experimental results show that our approaches can achieve an efficiency close to that in under-subscribed scenarios while preserving the capability to expand to many more CPUs. The performance gain is up to 77% for blocking- and 19x for busy-waiting-based applications compared to the vanilla Linux.
通过高效线程超额订阅开发CPU弹性
弹性是云计算的一个基本特性,它允许用户动态地添加或删除资源以响应工作负载的变化。然而,构建真正利用弹性的应用程序并非易事。传统的应用程序需要修改,以有效地利用可变资源。本文探讨了线程超额订阅,即提供比可用内核更多的线程,以利用云中的CPU弹性。虽然保持足够的并发性允许应用程序在有更多可用cpu时利用额外的cpu,但人们普遍认为,由于过多的上下文切换、局域性丢失和共享资源上的争用,线程过度订阅会带来令人望而却步的开销。在本文中,我们对线程超额订阅的开销进行了全面的研究。我们发现1)上下文切换的直接成本(即在现代处理器上为1-2 μs)不会导致大多数应用程序的明显性能下降;2)过度订阅对CPU缓存和TLB的性能既具有建设性又具有破坏性。我们确定了两个先前未被充分研究的问题,它们是导致许多应用程序在超额订阅情况下急剧减速的原因。首先,操作系统内核中现有的线程睡眠和唤醒进程在处理超额订阅线程方面效率低下。其次,程序代码中普遍存在的忙碌等待操作会浪费CPU并饿死关键线程。为此,我们设计了两种操作系统机制,虚拟阻塞和忙碌等待检测,以实现高效的线程超额订阅,而无需更改程序代码。实验结果表明,我们的方法在保留扩展到更多cpu的能力的同时,可以达到接近于订阅不足场景的效率。与普通Linux相比,阻塞应用程序的性能提升高达77%,而基于忙碌等待的应用程序的性能提升高达19倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信