RISC-V 内核上的 MiniFloats:使用混合精度短点积的 ISA 扩展

IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Luca Bertaccini;Gianna Paulin;Matheus Cavalcante;Tim Fischer;Stefan Mach;Luca Benini
{"title":"RISC-V 内核上的 MiniFloats:使用混合精度短点积的 ISA 扩展","authors":"Luca Bertaccini;Gianna Paulin;Matheus Cavalcante;Tim Fischer;Stefan Mach;Luca Benini","doi":"10.1109/TETC.2024.3365354","DOIUrl":null,"url":null,"abstract":"Low-precision floating-point (FP) formats have recently been intensely investigated in the context of machine learning inference and training applications. While 16-bit formats are already widely used, 8-bit FP data types have lately emerged as a viable option for neural network training when employed in a mixed-precision scenario and combined with rounding methods increasing the precision in compound additions, such as stochastic rounding. So far, hardware implementations supporting FP8 are mostly implemented within domain-specific accelerators. We propose two RISC-V instruction set architecture (ISA) extensions, enhancing respectively scalar and vector general-purpose cores with low and mixed-precision capabilities. The extensions support two 8-bit and two 16-bit FP formats and are based on dot-product instructions accumulating at higher precision. We develop a hardware unit supporting mixed-precision dot products and stochastic rounding and integrate it into an open-source floating-point unit (FPU). Finally, we integrate the enhanced FPU into a cluster of scalar cores, as well as a cluster of vector cores, and implement them in a 12 nm FinFET technology. The former achieves 575 GFLOPS/W on FP8-to-FP16 matrix multiplications at 0.8 V, 1.26 GHz; the latter reaches 860 GFLOPS/W at 0.8 V, 1.08 GHz, 1.93x higher efficiency than computing on FP16-to-FP32.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 4","pages":"1040-1055"},"PeriodicalIF":5.1000,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MiniFloats on RISC-V Cores: ISA Extensions With Mixed-Precision Short Dot Products\",\"authors\":\"Luca Bertaccini;Gianna Paulin;Matheus Cavalcante;Tim Fischer;Stefan Mach;Luca Benini\",\"doi\":\"10.1109/TETC.2024.3365354\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Low-precision floating-point (FP) formats have recently been intensely investigated in the context of machine learning inference and training applications. While 16-bit formats are already widely used, 8-bit FP data types have lately emerged as a viable option for neural network training when employed in a mixed-precision scenario and combined with rounding methods increasing the precision in compound additions, such as stochastic rounding. So far, hardware implementations supporting FP8 are mostly implemented within domain-specific accelerators. We propose two RISC-V instruction set architecture (ISA) extensions, enhancing respectively scalar and vector general-purpose cores with low and mixed-precision capabilities. The extensions support two 8-bit and two 16-bit FP formats and are based on dot-product instructions accumulating at higher precision. We develop a hardware unit supporting mixed-precision dot products and stochastic rounding and integrate it into an open-source floating-point unit (FPU). Finally, we integrate the enhanced FPU into a cluster of scalar cores, as well as a cluster of vector cores, and implement them in a 12 nm FinFET technology. The former achieves 575 GFLOPS/W on FP8-to-FP16 matrix multiplications at 0.8 V, 1.26 GHz; the latter reaches 860 GFLOPS/W at 0.8 V, 1.08 GHz, 1.93x higher efficiency than computing on FP16-to-FP32.\",\"PeriodicalId\":13156,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computing\",\"volume\":\"12 4\",\"pages\":\"1040-1055\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-02-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10440050/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10440050/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

低精度浮点(FP)格式最近在机器学习推理和训练应用中得到了广泛的研究。虽然16位格式已经被广泛使用,但8位FP数据类型最近成为神经网络训练的可行选择,当在混合精度场景中使用并与四舍五入方法结合使用时,可以提高复合加法的精度,例如随机四舍五入。到目前为止,支持FP8的硬件实现大多是在特定领域的加速器中实现的。我们提出了两种RISC-V指令集架构(ISA)扩展,分别增强了具有低精度和混合精度能力的标量和矢量通用内核。扩展支持两个8位和两个16位FP格式,并基于点积指令以更高的精度累积。我们开发了一个支持混合精度点积和随机舍入的硬件单元,并将其集成到一个开源的浮点单元(FPU)中。最后,我们将增强的FPU集成到标量核集群以及矢量核集群中,并在12 nm FinFET技术中实现它们。前者在0.8 V, 1.26 GHz下在fp8到fp16矩阵乘法上实现575 GFLOPS/W;后者在0.8 V, 1.08 GHz下达到860 GFLOPS/W,比在fp16到fp32上计算效率高1.93倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MiniFloats on RISC-V Cores: ISA Extensions With Mixed-Precision Short Dot Products
Low-precision floating-point (FP) formats have recently been intensely investigated in the context of machine learning inference and training applications. While 16-bit formats are already widely used, 8-bit FP data types have lately emerged as a viable option for neural network training when employed in a mixed-precision scenario and combined with rounding methods increasing the precision in compound additions, such as stochastic rounding. So far, hardware implementations supporting FP8 are mostly implemented within domain-specific accelerators. We propose two RISC-V instruction set architecture (ISA) extensions, enhancing respectively scalar and vector general-purpose cores with low and mixed-precision capabilities. The extensions support two 8-bit and two 16-bit FP formats and are based on dot-product instructions accumulating at higher precision. We develop a hardware unit supporting mixed-precision dot products and stochastic rounding and integrate it into an open-source floating-point unit (FPU). Finally, we integrate the enhanced FPU into a cluster of scalar cores, as well as a cluster of vector cores, and implement them in a 12 nm FinFET technology. The former achieves 575 GFLOPS/W on FP8-to-FP16 matrix multiplications at 0.8 V, 1.26 GHz; the latter reaches 860 GFLOPS/W at 0.8 V, 1.08 GHz, 1.93x higher efficiency than computing on FP16-to-FP32.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Emerging Topics in Computing
IEEE Transactions on Emerging Topics in Computing Computer Science-Computer Science (miscellaneous)
CiteScore
12.10
自引率
5.10%
发文量
113
期刊介绍: IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信