rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2015-03-14 DOI:10.1145/2694344.2694355

M. Malka, Nadav Amit, Muli Ben-Yehuda, Dan Tsafrir

{"title":"rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers","authors":"M. Malka, Nadav Amit, Muli Ben-Yehuda, Dan Tsafrir","doi":"10.1145/2694344.2694355","DOIUrl":null,"url":null,"abstract":"The IOMMU allows the OS to encapsulate I/O devices in their own virtual memory spaces, thus restricting their DMAs to specific memory pages. The OS uses the IOMMU to protect itself against buggy drivers and malicious/errant devices. But the added protection comes at a cost, degrading the throughput of I/O-intensive workloads by up to an order of magnitude. This cost has motivated system designers to trade off some safety for performance, e.g., by leaving stale information in the IOTLB for a while so as to amortize costly invalidations. We observe that high-bandwidth devices---like network and PCIe SSD controllers---interact with the OS via circular ring buffers that induce a sequential, predictable workload. We design a ring IOMMU (rIOMMU) that leverages this characteristic by replacing the virtual memory page table hierarchy with a circular, flat table. A flat table is adequately supported by exactly one IOTLB entry, making every new translation an implicit invalidation of the former and thus requiring explicit invalidations only at the end of I/O bursts. Using standard networking benchmarks, we show that rIOMMU provides up to 7.56x higher throughput relative to the baseline IOMMU, and that it is within 0.77--1.00x the throughput of a system without IOMMU protection.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"206 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2694344.2694355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 44

Abstract

The IOMMU allows the OS to encapsulate I/O devices in their own virtual memory spaces, thus restricting their DMAs to specific memory pages. The OS uses the IOMMU to protect itself against buggy drivers and malicious/errant devices. But the added protection comes at a cost, degrading the throughput of I/O-intensive workloads by up to an order of magnitude. This cost has motivated system designers to trade off some safety for performance, e.g., by leaving stale information in the IOTLB for a while so as to amortize costly invalidations. We observe that high-bandwidth devices---like network and PCIe SSD controllers---interact with the OS via circular ring buffers that induce a sequential, predictable workload. We design a ring IOMMU (rIOMMU) that leverages this characteristic by replacing the virtual memory page table hierarchy with a circular, flat table. A flat table is adequately supported by exactly one IOTLB entry, making every new translation an implicit invalidation of the former and thus requiring explicit invalidations only at the end of I/O bursts. Using standard networking benchmarks, we show that rIOMMU provides up to 7.56x higher throughput relative to the baseline IOMMU, and that it is within 0.77--1.00x the throughput of a system without IOMMU protection.

查看原文本刊更多论文

rIOMMU:用于使用环缓冲区的I/O设备的高效IOMMU

IOMMU允许操作系统将I/O设备封装在它们自己的虚拟内存空间中，从而将它们的dma限制在特定的内存页上。操作系统使用IOMMU保护自己免受驱动程序错误和恶意/错误设备的攻击。但是，增加保护是有代价的，它会将I/ o密集型工作负载的吞吐量降低一个数量级。这种代价促使系统设计者为了性能而牺牲一些安全性，例如，将过时的信息在IOTLB中保留一段时间，以便分摊代价高昂的失效。我们观察到，高带宽设备(如网络和PCIe SSD控制器)通过环形缓冲区与操作系统交互，从而产生顺序的、可预测的工作负载。我们设计了一个环形IOMMU (rIOMMU)，它利用这一特性，用一个圆形的平面表替换虚拟内存页表层次结构。只有一个IOTLB项就足以支持一个平面表，这使得每次新的转换都是对先前的隐式失效，因此只在I/O爆发结束时才需要显式失效。使用标准的网络基准测试，我们发现rIOMMU提供的吞吐量比基准IOMMU高7.56倍，并且在没有IOMMU保护的系统的吞吐量的0.77- 1.00倍之内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

自引率

0.00%

发文量