线程每核架构对应用程序尾部延迟的影响

2019 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) Pub Date : 2019-09-01 DOI:10.1109/ANCS.2019.8901874

Pekka Enberg, Ashwin Rao, S. Tarkoma

{"title":"线程每核架构对应用程序尾部延迟的影响","authors":"Pekka Enberg, Ashwin Rao, S. Tarkoma","doi":"10.1109/ANCS.2019.8901874","DOIUrl":null,"url":null,"abstract":"The response time of an online service depends on the tail latency of a few of the applications it invokes in parallel to satisfy the requests. The individual applications are composed of one or more threads to fully utilize the available CPU cores, but this approach can incur serious overheads. The thread-per-core architecture has emerged to reduce these overheads, but it also has its challenges from thread synchronization and OS interfaces. Applications can mitigate both issues with different techniques, but their impact on application tail latency is an open question. We measure the impact of thread-per-core architecture on application tail latency by implementing a key-value store that uses application-level partitioning, and inter-thread messaging and compare its tail latency to Memcached which uses a traditional key-value store design. We show in an experimental evaluation that our approach reduces tail latency by up to 71 % compared to baseline Memcached running on commodity hardware and Linux. However, we observe that the thread-per-core approach is held back by request steering and OS interfaces, and it could be further improved with NIC hardware offload.","PeriodicalId":405320,"journal":{"name":"2019 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"The Impact of Thread-Per-Core Architecture on Application Tail Latency\",\"authors\":\"Pekka Enberg, Ashwin Rao, S. Tarkoma\",\"doi\":\"10.1109/ANCS.2019.8901874\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The response time of an online service depends on the tail latency of a few of the applications it invokes in parallel to satisfy the requests. The individual applications are composed of one or more threads to fully utilize the available CPU cores, but this approach can incur serious overheads. The thread-per-core architecture has emerged to reduce these overheads, but it also has its challenges from thread synchronization and OS interfaces. Applications can mitigate both issues with different techniques, but their impact on application tail latency is an open question. We measure the impact of thread-per-core architecture on application tail latency by implementing a key-value store that uses application-level partitioning, and inter-thread messaging and compare its tail latency to Memcached which uses a traditional key-value store design. We show in an experimental evaluation that our approach reduces tail latency by up to 71 % compared to baseline Memcached running on commodity hardware and Linux. However, we observe that the thread-per-core approach is held back by request steering and OS interfaces, and it could be further improved with NIC hardware offload.\",\"PeriodicalId\":405320,\"journal\":{\"name\":\"2019 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ANCS.2019.8901874\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ANCS.2019.8901874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

在线服务的响应时间取决于为满足请求而并行调用的几个应用程序的尾延迟。单个应用程序由一个或多个线程组成，以充分利用可用的CPU内核，但这种方法可能会导致严重的开销。每核线程架构的出现是为了减少这些开销，但它也面临线程同步和操作系统接口方面的挑战。应用程序可以使用不同的技术来缓解这两个问题，但是它们对应用程序尾部延迟的影响是一个悬而未决的问题。我们通过实现一个使用应用程序级分区的键值存储和线程间消息传递来衡量每核线程架构对应用程序尾部延迟的影响，并将其尾部延迟与使用传统键值存储设计的Memcached进行比较。我们在一个实验评估中表明，与在商用硬件和Linux上运行Memcached的基线相比，我们的方法将尾部延迟减少了71%。然而，我们观察到每核线程的方法受到请求转向和操作系统接口的阻碍，并且可以通过NIC硬件卸载进一步改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Impact of Thread-Per-Core Architecture on Application Tail Latency

The response time of an online service depends on the tail latency of a few of the applications it invokes in parallel to satisfy the requests. The individual applications are composed of one or more threads to fully utilize the available CPU cores, but this approach can incur serious overheads. The thread-per-core architecture has emerged to reduce these overheads, but it also has its challenges from thread synchronization and OS interfaces. Applications can mitigate both issues with different techniques, but their impact on application tail latency is an open question. We measure the impact of thread-per-core architecture on application tail latency by implementing a key-value store that uses application-level partitioning, and inter-thread messaging and compare its tail latency to Memcached which uses a traditional key-value store design. We show in an experimental evaluation that our approach reduces tail latency by up to 71 % compared to baseline Memcached running on commodity hardware and Linux. However, we observe that the thread-per-core approach is held back by request steering and OS interfaces, and it could be further improved with NIC hardware offload.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)

自引率

0.00%

发文量