An Instruction Inflation Analyzing Framework for Dynamic Binary Translators

IF 1.8 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Architecture and Code Optimization Pub Date : 2024-01-15 DOI:10.1145/3640813

Benyi Xie, Yue Yan, Chenghao Yan, Sicheng Tao, Zhuangzhuang Zhang, Xinyu Li, Yanzhi Lan, Xiang Wu, Tianyi Liu, Tingting Zhang, Fuxin Zhang

{"title":"An Instruction Inflation Analyzing Framework for Dynamic Binary Translators","authors":"Benyi Xie, Yue Yan, Chenghao Yan, Sicheng Tao, Zhuangzhuang Zhang, Xinyu Li, Yanzhi Lan, Xiang Wu, Tianyi Liu, Tingting Zhang, Fuxin Zhang","doi":"10.1145/3640813","DOIUrl":null,"url":null,"abstract":"<p>Dynamic binary translators (DBTs) are widely used to migrate applications between different instruction set architectures (ISAs). Despite extensive research to improve DBT performance, noticeable overhead remains, preventing near-native performance, especially when translating from complex instruction set computer (CISC) to reduced instruction set computer (RISC). For computational workloads, the main overhead stems from translated code quality. Experimental data show state-of-the-art DBT products have dynamic code inflation of at least 1.46. This indicates on average over 1.46 host instructions are needed to emulate one guest instruction. Worse, inflation closely correlates with translated code quality. However, the detailed sources of instruction inflation remain unclear. </p><p>To understand the sources of inflation, we present Deflater, an instruction inflation analysis framework comprising a mathematical model, a collection of black-box unit tests called BenchMIAOes, and a trace-based simulator called InflatSim. The mathematical model calculates overall inflation based on the inflation of individual instructions and translation block (TB) optimizations. BenchMIAOes extract model parameters from DBTs without accessing DBT source code. InflatSim implements the model and uses the extracted parameters from BenchMIAOes to simulate a given DBT’s behavior. Deflater is a valuable tool to guide DBT analysis and improvement. Using Deflater, we simulated inflation for three state-of-the-art CISC-to-RISC DBTs: ExaGear, Rosetta2, and LATX, with inflation errors of 5.63%, 5.15%, and 3.44% respectively for SPEC CPU 2017, gaining insights into these commercial DBTs. Deflater also efficiently models inflation for the open-source DBT QEMU and suggests optimizations that can substantially reduce inflation. Implementing the suggested optimizations confirms Deflater’s effective guidance, with 4.65% inflation error, and gains 5.47x performance improvement.</p>","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"22 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3640813","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Dynamic binary translators (DBTs) are widely used to migrate applications between different instruction set architectures (ISAs). Despite extensive research to improve DBT performance, noticeable overhead remains, preventing near-native performance, especially when translating from complex instruction set computer (CISC) to reduced instruction set computer (RISC). For computational workloads, the main overhead stems from translated code quality. Experimental data show state-of-the-art DBT products have dynamic code inflation of at least 1.46. This indicates on average over 1.46 host instructions are needed to emulate one guest instruction. Worse, inflation closely correlates with translated code quality. However, the detailed sources of instruction inflation remain unclear.

To understand the sources of inflation, we present Deflater, an instruction inflation analysis framework comprising a mathematical model, a collection of black-box unit tests called BenchMIAOes, and a trace-based simulator called InflatSim. The mathematical model calculates overall inflation based on the inflation of individual instructions and translation block (TB) optimizations. BenchMIAOes extract model parameters from DBTs without accessing DBT source code. InflatSim implements the model and uses the extracted parameters from BenchMIAOes to simulate a given DBT’s behavior. Deflater is a valuable tool to guide DBT analysis and improvement. Using Deflater, we simulated inflation for three state-of-the-art CISC-to-RISC DBTs: ExaGear, Rosetta2, and LATX, with inflation errors of 5.63%, 5.15%, and 3.44% respectively for SPEC CPU 2017, gaining insights into these commercial DBTs. Deflater also efficiently models inflation for the open-source DBT QEMU and suggests optimizations that can substantially reduce inflation. Implementing the suggested optimizations confirms Deflater’s effective guidance, with 4.65% inflation error, and gains 5.47x performance improvement.

查看原文本刊更多论文

动态二进制翻译器的指令膨胀分析框架

动态二进制转换器（DBT）被广泛用于在不同指令集架构（ISA）之间迁移应用程序。尽管为提高 DBT 性能进行了大量研究，但仍存在明显的开销，无法实现接近原生的性能，尤其是从复杂指令集计算机 (CISC) 转换到精简指令集计算机 (RISC) 时更是如此。对于计算工作负载，主要开销来自翻译代码的质量。实验数据显示，最先进的 DBT 产品的动态代码膨胀率至少为 1.46。这表明模拟一条客户指令平均需要超过 1.46 条主机指令。更糟糕的是，膨胀与翻译代码质量密切相关。然而，指令膨胀的详细来源仍不清楚。为了了解膨胀的来源，我们提出了一个指令膨胀分析框架 Deflater，该框架由一个数学模型、一组名为 BenchMIAOes 的黑盒单元测试和一个名为 InflatSim 的基于跟踪的模拟器组成。数学模型根据单条指令的膨胀和翻译块 (TB) 优化计算整体膨胀。BenchMIAOes 从 DBT 中提取模型参数，无需访问 DBT 源代码。InflatSim 实现了该模型，并使用从 BenchMIAOes 提取的参数来模拟给定 DBT 的行为。Deflater 是指导 DBT 分析和改进的重要工具。利用 Deflater，我们模拟了三种最先进的 CISC 转 RISC DBT 的膨胀情况：在 SPEC CPU 2017 中，ExaGear、Rosetta2 和 LATX 的膨胀误差分别为 5.63%、5.15% 和 3.44%，从而深入了解了这些商用 DBT。Deflater 还对开源 DBT QEMU 的膨胀进行了有效建模，并提出了可大幅降低膨胀的优化建议。实施建议的优化措施证实了 Deflater 的有效指导，膨胀误差为 4.65%，性能提高了 5.47 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Architecture and Code Optimization 工程技术-计算机：理论方法

CiteScore

3.60

自引率

6.20%

发文量

审稿时长

6-12 weeks

期刊介绍： ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.