利用32位浮点运算的性能获得64位精度(再论线性系统的迭代细化)

J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Dongarra
{"title":"利用32位浮点运算的性能获得64位精度(再论线性系统的迭代细化)","authors":"J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Dongarra","doi":"10.1145/1188455.1188573","DOIUrl":null,"url":null,"abstract":"Recent versions of microprocessors exhibit performance characteristics for 32 bit floating point arithmetic (single precision) that is substantially higher than 64 bit floating point arithmetic (double precision). Examples include the Intel's Pentium IV and M processors, AMD's Opteron architectures and the IBM's Cell Broad Engine processor. When working in single precision, floating point operations can be performed up to two times faster on the Pentium and up to ten times faster on the Cell over double precision. The performance enhancements in these architectures are derived by accessing extensions to the basic architecture, such as SSE2 in the case of the Pentium and the vector functions on the IBM Cell. The motivation for this paper is to exploit single precision operations whenever possible and resort to double precision at critical stages while attempting to provide the full double precision results. The results described here are fairly general and can be applied to various problems in linear algebra such as solving large sparse systems, using direct or iterative methods and some eigenvalue problems. There are limitations to the success of this process, such as when the conditioning of the problem exceeds the reciprocal of the accuracy of the single precision computations. In that case the double precision algorithm should be used","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"08 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"159","resultStr":"{\"title\":\"Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems)\",\"authors\":\"J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Dongarra\",\"doi\":\"10.1145/1188455.1188573\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent versions of microprocessors exhibit performance characteristics for 32 bit floating point arithmetic (single precision) that is substantially higher than 64 bit floating point arithmetic (double precision). Examples include the Intel's Pentium IV and M processors, AMD's Opteron architectures and the IBM's Cell Broad Engine processor. When working in single precision, floating point operations can be performed up to two times faster on the Pentium and up to ten times faster on the Cell over double precision. The performance enhancements in these architectures are derived by accessing extensions to the basic architecture, such as SSE2 in the case of the Pentium and the vector functions on the IBM Cell. The motivation for this paper is to exploit single precision operations whenever possible and resort to double precision at critical stages while attempting to provide the full double precision results. The results described here are fairly general and can be applied to various problems in linear algebra such as solving large sparse systems, using direct or iterative methods and some eigenvalue problems. There are limitations to the success of this process, such as when the conditioning of the problem exceeds the reciprocal of the accuracy of the single precision computations. In that case the double precision algorithm should be used\",\"PeriodicalId\":333909,\"journal\":{\"name\":\"ACM/IEEE SC 2006 Conference (SC'06)\",\"volume\":\"08 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"159\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM/IEEE SC 2006 Conference (SC'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1188455.1188573\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 2006 Conference (SC'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1188455.1188573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 159

摘要

最新版本的微处理器显示出32位浮点运算(单精度)的性能特征,大大高于64位浮点运算(双精度)。例如英特尔的Pentium IV和M处理器,AMD的Opteron架构和IBM的Cell Broad Engine处理器。当以单精度工作时,在Pentium上执行浮点运算的速度可以提高两倍,在Cell上执行浮点运算的速度可以提高十倍。这些体系结构中的性能增强是通过访问基本体系结构的扩展而获得的,例如Pentium中的SSE2和IBM Cell上的矢量函数。本文的动机是尽可能利用单精度操作,并在关键阶段求助于双精度,同时试图提供完整的双精度结果。这里描述的结果是相当普遍的,可以应用于线性代数中的各种问题,如求解大型稀疏系统,使用直接或迭代方法以及一些特征值问题。这一过程的成功是有限制的,例如当问题的条件作用超过单精度计算精度的倒数时。在这种情况下,应该使用双精度算法
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems)
Recent versions of microprocessors exhibit performance characteristics for 32 bit floating point arithmetic (single precision) that is substantially higher than 64 bit floating point arithmetic (double precision). Examples include the Intel's Pentium IV and M processors, AMD's Opteron architectures and the IBM's Cell Broad Engine processor. When working in single precision, floating point operations can be performed up to two times faster on the Pentium and up to ten times faster on the Cell over double precision. The performance enhancements in these architectures are derived by accessing extensions to the basic architecture, such as SSE2 in the case of the Pentium and the vector functions on the IBM Cell. The motivation for this paper is to exploit single precision operations whenever possible and resort to double precision at critical stages while attempting to provide the full double precision results. The results described here are fairly general and can be applied to various problems in linear algebra such as solving large sparse systems, using direct or iterative methods and some eigenvalue problems. There are limitations to the success of this process, such as when the conditioning of the problem exceeds the reciprocal of the accuracy of the single precision computations. In that case the double precision algorithm should be used
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信