Enhancing Atomic Instruction Emulation for Cross-ISA Dynamic Binary Translation

Ziyi Zhao, Zhang Jiang, Ying Chen, Xiaoli Gong, Wenwen Wang, P. Yew
{"title":"Enhancing Atomic Instruction Emulation for Cross-ISA Dynamic Binary Translation","authors":"Ziyi Zhao, Zhang Jiang, Ying Chen, Xiaoli Gong, Wenwen Wang, P. Yew","doi":"10.1109/CGO51591.2021.9370312","DOIUrl":null,"url":null,"abstract":"Dynamic Binary Translation (DBT) is a key enabler for cross-ISA emulation, system virtualization, runtime instrumentation, and many other important applications. Among several critical requirements for DBT, it is important to provide equivalent semantics for atomic synchronization instructions such as Load - Link / Store - Conditional (LL/SC), which are mostly included in the reduced-instruction set architectures (RISC) and Compare-and-Swap(CAS), which is mostly in the complex instruction set architectures (CISC). However, the state-of-the-art DBT tools often do not provide a fully correct translation of these atomic instructions, in particular, from RISC atomic instructions (i.e. LL/SC) to CISC atomic instructions (i.e. CAS), due to performance concerns. As a result, some may cause the well-known ABA problem, which could lead to wrong results or program crashes. In our experimental studies on QEMU, a state-of-the-art DBT, that runs multi-threaded lock-free stack operations implemented with ARM instruction set (i.e. using LL/SC) on Intel x86 platforms (i.e. using CAS), it often crashes within 2 seconds. Although attempts have been made to provide correct emulation for such atomic instructions, they either result in heavy execution overheads or require additional hardware support. In this paper, we propose several schemes to address those issues and implement them on QEMU to evaluate their performance overheads. The results show that all of the proposed schemes can provide correct emulation and, for the best solution, can achieve a min, max, geomean speedup of 1.25x, 3.21x, 2.03x respectively, over the best existing software-based scheme.","PeriodicalId":275062,"journal":{"name":"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO51591.2021.9370312","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Dynamic Binary Translation (DBT) is a key enabler for cross-ISA emulation, system virtualization, runtime instrumentation, and many other important applications. Among several critical requirements for DBT, it is important to provide equivalent semantics for atomic synchronization instructions such as Load - Link / Store - Conditional (LL/SC), which are mostly included in the reduced-instruction set architectures (RISC) and Compare-and-Swap(CAS), which is mostly in the complex instruction set architectures (CISC). However, the state-of-the-art DBT tools often do not provide a fully correct translation of these atomic instructions, in particular, from RISC atomic instructions (i.e. LL/SC) to CISC atomic instructions (i.e. CAS), due to performance concerns. As a result, some may cause the well-known ABA problem, which could lead to wrong results or program crashes. In our experimental studies on QEMU, a state-of-the-art DBT, that runs multi-threaded lock-free stack operations implemented with ARM instruction set (i.e. using LL/SC) on Intel x86 platforms (i.e. using CAS), it often crashes within 2 seconds. Although attempts have been made to provide correct emulation for such atomic instructions, they either result in heavy execution overheads or require additional hardware support. In this paper, we propose several schemes to address those issues and implement them on QEMU to evaluate their performance overheads. The results show that all of the proposed schemes can provide correct emulation and, for the best solution, can achieve a min, max, geomean speedup of 1.25x, 3.21x, 2.03x respectively, over the best existing software-based scheme.
增强跨isa动态二进制转换的原子指令仿真
动态二进制转换(DBT)是跨isa仿真、系统虚拟化、运行时检测和许多其他重要应用程序的关键支持因素。在DBT的几个关键需求中,重要的是为原子同步指令(如Load - Link / Store - Conditional (LL/SC))提供等效语义,这些指令主要包含在精简指令集体系结构(RISC)和比较与交换(CAS)中,它们主要存在于复杂指令集体系结构(CISC)中。然而,由于性能问题,最先进的DBT工具通常不能提供这些原子指令的完全正确的转换,特别是从RISC原子指令(即LL/SC)到CISC原子指令(即CAS)。因此,有些可能会导致众所周知的ABA问题,这可能导致错误的结果或程序崩溃。在我们对QEMU的实验研究中,QEMU是一种最先进的DBT,在Intel x86平台(即使用CAS)上运行使用ARM指令集(即使用LL/SC)实现的多线程无锁堆栈操作,它经常在2秒内崩溃。尽管已经尝试为这些原子指令提供正确的模拟,但它们要么导致沉重的执行开销,要么需要额外的硬件支持。在本文中,我们提出了几个方案来解决这些问题,并在QEMU上实现它们,以评估它们的性能开销。结果表明,所提方案均能提供正确的仿真结果,且与现有的最佳软件方案相比,最优方案的最小、最大和几何速度分别提高1.25倍、3.21倍和2.03倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信