提高服务器效率面对杀手级微秒

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2019-02-01 DOI:10.1109/HPCA.2019.00037

Amirhossein Mirhosseini, Akshitha Sriraman, T. Wenisch

{"title":"提高服务器效率面对杀手级微秒","authors":"Amirhossein Mirhosseini, Akshitha Sriraman, T. Wenisch","doi":"10.1109/HPCA.2019.00037","DOIUrl":null,"url":null,"abstract":"We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of μs-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve core utilization and increase server performance density. Unfortunately, scaling SMT to provision enough threads to hide frequent μs-scale stalls is prohibitive and SMT co-location can often drastically increase the tail latency of cloud microservices. In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily executes a single latency-critical master-thread, and lender-cores, which multiplex latency-insensitive throughput threads. When the master-thread stalls, the master-core borrows filler-threads from the lender-core, filling μs-scale utilization holes of the microservice. We propose critical mechanisms, including separate memory paths for the master-thread and filler-threads, to enable master-cores to borrow filler-threads while protecting master-threads’ state from disruption. Duplexity facilitates fast master-thread restart when stalls resolve and minimizes the microservice’s QoS violation. Our evaluation demonstrates that Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Enhancing Server Efficiency in the Face of Killer Microseconds\",\"authors\":\"Amirhossein Mirhosseini, Akshitha Sriraman, T. Wenisch\",\"doi\":\"10.1109/HPCA.2019.00037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of μs-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve core utilization and increase server performance density. Unfortunately, scaling SMT to provision enough threads to hide frequent μs-scale stalls is prohibitive and SMT co-location can often drastically increase the tail latency of cloud microservices. In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily executes a single latency-critical master-thread, and lender-cores, which multiplex latency-insensitive throughput threads. When the master-thread stalls, the master-core borrows filler-threads from the lender-core, filling μs-scale utilization holes of the microservice. We propose critical mechanisms, including separate memory paths for the master-thread and filler-threads, to enable master-cores to borrow filler-threads while protecting master-threads’ state from disruption. Duplexity facilitates fast master-thread restart when stalls resolve and minimizes the microservice’s QoS violation. Our evaluation demonstrates that Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.\",\"PeriodicalId\":102050,\"journal\":{\"name\":\"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2019.00037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

摘要

我们正在进入数据中心应用的“杀手级微秒”时代。杀手级微秒指的是由于访问快速I/O设备的停顿或高吞吐量微服务请求之间的短暂空闲时间导致的CPU调度中μs级的“漏洞”。尽管现代计算平台可以通过微架构技术和操作系统上下文切换有效地隐藏ns级和ms级的延迟，但它们缺乏对μ级延迟的有效支持。同步多线程(SMT)是提高核心利用率和提高服务器性能密度的有效方法。不幸的是，扩展SMT以提供足够的线程来隐藏频繁的μ级延迟是令人禁止的，并且SMT的协同定位通常会大大增加云微服务的尾部延迟。在本文中，我们提出了双工(Duplexity)，这是一种异构服务器架构，它采用积极的多线程来隐藏杀手级微秒的延迟，而不会牺牲延迟敏感微服务的服务质量(QoS)。双工性提供两种核的双(对):主核，每个主核主要执行一个延迟关键的主线程，和借出核，多路延迟不敏感的吞吐量线程。当主线程停止运行时，主核从借出核中借用填充线程，填补微服务μ级的利用率漏洞。我们提出了关键机制，包括主线程和填充线程的独立内存路径，以使主核能够借用填充线程，同时保护主线程的状态免受中断。双工性有助于在中断解决时快速重启主线程，并最大限度地减少微服务的QoS冲突。我们的评估表明，与基于smt的服务器设计相比，duplex能够实现1.9倍高的核心利用率和2.7倍低的等吞吐量第99百分位尾部延迟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing Server Efficiency in the Face of Killer Microseconds

We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of μs-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve core utilization and increase server performance density. Unfortunately, scaling SMT to provision enough threads to hide frequent μs-scale stalls is prohibitive and SMT co-location can often drastically increase the tail latency of cloud microservices. In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily executes a single latency-critical master-thread, and lender-cores, which multiplex latency-insensitive throughput threads. When the master-thread stalls, the master-core borrows filler-threads from the lender-core, filling μs-scale utilization holes of the microservice. We propose critical mechanisms, including separate memory paths for the master-thread and filler-threads, to enable master-cores to borrow filler-threads while protecting master-threads’ state from disruption. Duplexity facilitates fast master-thread restart when stalls resolve and minimizes the microservice’s QoS violation. Our evaluation demonstrates that Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量