rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers

M. Malka, Nadav Amit, Muli Ben-Yehuda, Dan Tsafrir
{"title":"rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers","authors":"M. Malka, Nadav Amit, Muli Ben-Yehuda, Dan Tsafrir","doi":"10.1145/2694344.2694355","DOIUrl":null,"url":null,"abstract":"The IOMMU allows the OS to encapsulate I/O devices in their own virtual memory spaces, thus restricting their DMAs to specific memory pages. The OS uses the IOMMU to protect itself against buggy drivers and malicious/errant devices. But the added protection comes at a cost, degrading the throughput of I/O-intensive workloads by up to an order of magnitude. This cost has motivated system designers to trade off some safety for performance, e.g., by leaving stale information in the IOTLB for a while so as to amortize costly invalidations. We observe that high-bandwidth devices---like network and PCIe SSD controllers---interact with the OS via circular ring buffers that induce a sequential, predictable workload. We design a ring IOMMU (rIOMMU) that leverages this characteristic by replacing the virtual memory page table hierarchy with a circular, flat table. A flat table is adequately supported by exactly one IOTLB entry, making every new translation an implicit invalidation of the former and thus requiring explicit invalidations only at the end of I/O bursts. Using standard networking benchmarks, we show that rIOMMU provides up to 7.56x higher throughput relative to the baseline IOMMU, and that it is within 0.77--1.00x the throughput of a system without IOMMU protection.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"206 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2694344.2694355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44

Abstract

The IOMMU allows the OS to encapsulate I/O devices in their own virtual memory spaces, thus restricting their DMAs to specific memory pages. The OS uses the IOMMU to protect itself against buggy drivers and malicious/errant devices. But the added protection comes at a cost, degrading the throughput of I/O-intensive workloads by up to an order of magnitude. This cost has motivated system designers to trade off some safety for performance, e.g., by leaving stale information in the IOTLB for a while so as to amortize costly invalidations. We observe that high-bandwidth devices---like network and PCIe SSD controllers---interact with the OS via circular ring buffers that induce a sequential, predictable workload. We design a ring IOMMU (rIOMMU) that leverages this characteristic by replacing the virtual memory page table hierarchy with a circular, flat table. A flat table is adequately supported by exactly one IOTLB entry, making every new translation an implicit invalidation of the former and thus requiring explicit invalidations only at the end of I/O bursts. Using standard networking benchmarks, we show that rIOMMU provides up to 7.56x higher throughput relative to the baseline IOMMU, and that it is within 0.77--1.00x the throughput of a system without IOMMU protection.
rIOMMU:用于使用环缓冲区的I/O设备的高效IOMMU
IOMMU允许操作系统将I/O设备封装在它们自己的虚拟内存空间中,从而将它们的dma限制在特定的内存页上。操作系统使用IOMMU保护自己免受驱动程序错误和恶意/错误设备的攻击。但是,增加保护是有代价的,它会将I/ o密集型工作负载的吞吐量降低一个数量级。这种代价促使系统设计者为了性能而牺牲一些安全性,例如,将过时的信息在IOTLB中保留一段时间,以便分摊代价高昂的失效。我们观察到,高带宽设备(如网络和PCIe SSD控制器)通过环形缓冲区与操作系统交互,从而产生顺序的、可预测的工作负载。我们设计了一个环形IOMMU (rIOMMU),它利用这一特性,用一个圆形的平面表替换虚拟内存页表层次结构。只有一个IOTLB项就足以支持一个平面表,这使得每次新的转换都是对先前的隐式失效,因此只在I/O爆发结束时才需要显式失效。使用标准的网络基准测试,我们发现rIOMMU提供的吞吐量比基准IOMMU高7.56倍,并且在没有IOMMU保护的系统的吞吐量的0.77- 1.00倍之内。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信