Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture Pub Date : 2014-12-13 DOI:10.1109/MICRO.2014.37

Jayneel Gandhi, Arkaprava Basu, M. Hill, M. Swift

{"title":"Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks","authors":"Jayneel Gandhi, Arkaprava Basu, M. Hill, M. Swift","doi":"10.1109/MICRO.2014.37","DOIUrl":null,"url":null,"abstract":"Virtualization provides value for many workloads, but its cost rises for workloads with poor memory access locality. This overhead comes from translation look aside buffer (TLB) misses where the hardware performs a 2D page walk (up to 24 memory references on x86-64) rather than a native TLB miss (up to only 4 memory references). The first dimension translates guest virtual addresses to guest physical addresses, while the second translates guest physical addresses to host physical addresses. This paper proposes new hardware using direct segments with three new virtualized modes of operation that significantly speed-up virtualized address translation. Further, this paper proposes two novel techniques to address important limitations of original direct segments. First, self-ballooning reduces fragmentation in physical memory, and addresses the architectural input/output (I/O) gap in x86-64. Second, an escape filter provides alternate translations for exceptional pages within a direct segment (e.g., Physical pages with permanent hard faults). We emulate the proposed hardware and prototype the software in Linux with KVM on x86-64. One mode -- VMM Direct -- reduces address translation overhead to near-native without guest application or OS changes (2% slower than native on average), while a more aggressive mode -- Dual Direct -- on big-memory workloads performs better-than-native with near-zero translation overhead.","PeriodicalId":6591,"journal":{"name":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","volume":"8 Suppl 2 1","pages":"178-189"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICRO.2014.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 93

Abstract

Virtualization provides value for many workloads, but its cost rises for workloads with poor memory access locality. This overhead comes from translation look aside buffer (TLB) misses where the hardware performs a 2D page walk (up to 24 memory references on x86-64) rather than a native TLB miss (up to only 4 memory references). The first dimension translates guest virtual addresses to guest physical addresses, while the second translates guest physical addresses to host physical addresses. This paper proposes new hardware using direct segments with three new virtualized modes of operation that significantly speed-up virtualized address translation. Further, this paper proposes two novel techniques to address important limitations of original direct segments. First, self-ballooning reduces fragmentation in physical memory, and addresses the architectural input/output (I/O) gap in x86-64. Second, an escape filter provides alternate translations for exceptional pages within a direct segment (e.g., Physical pages with permanent hard faults). We emulate the proposed hardware and prototype the software in Linux with KVM on x86-64. One mode -- VMM Direct -- reduces address translation overhead to near-native without guest application or OS changes (2% slower than native on average), while a more aggressive mode -- Dual Direct -- on big-memory workloads performs better-than-native with near-zero translation overhead.

查看原文本刊更多论文

高效内存虚拟化:降低嵌套页遍历的维数

虚拟化为许多工作负载提供了价值，但是对于内存访问局部性差的工作负载，虚拟化的成本会上升。这种开销来自翻译暂置缓冲区(TLB)缺失，其中硬件执行2D页遍历(在x86-64上最多24个内存引用)而不是本机TLB缺失(最多4个内存引用)。第一个维度将来宾虚拟地址转换为来宾物理地址，而第二个维度将来宾物理地址转换为主机物理地址。本文提出了使用直接分段的新硬件和三种新的虚拟化操作模式，显著加快了虚拟地址转换的速度。此外，本文提出了两种新的技术来解决原始直接段的重要局限性。首先，自膨胀减少了物理内存中的碎片，并解决了x86-64中的体系结构输入/输出(I/O)差距。其次，转义过滤器为直接段内的异常页提供替代翻译(例如，具有永久硬故障的物理页)。我们对所提出的硬件进行了仿真，并在x86-64上使用KVM在Linux环境下对软件进行了原型化。一种模式——VMM Direct——在不更改客户应用程序或操作系统的情况下，将地址转换开销降低到接近本机(平均比本机慢2%)，而一种更积极的模式——Dual Direct——在大内存工作负载上的性能优于本机，转换开销接近于零。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture

自引率

0.00%

发文量