Accurate floating-point operation using controlled floating-point precision

Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Pub Date : 2011-10-03 DOI:10.1109/PACRIM.2011.6032978

A. M. Zaki, Ayman M. Bahaa-Eldin, M. H. El-Shafey, G. Aly

{"title":"Accurate floating-point operation using controlled floating-point precision","authors":"A. M. Zaki, Ayman M. Bahaa-Eldin, M. H. El-Shafey, G. Aly","doi":"10.1109/PACRIM.2011.6032978","DOIUrl":null,"url":null,"abstract":"Rounding and accumulation of errors when using floating point numbers are important factors in computer arithmetic. Many applications suffer from these problems. The underlying machine architecture and representation of floating point numbers play the major role in the level and value of errors in this type of calculations. A quantitative measure of a system error level is the machine epsilon. In the current representation of floating point numbers, the machine epsilon can be as small as 9.63E-35 in the 128 bit version of IEEE standard floating point representation system. In this work a novel solution that guarantees achieving the desired minimum error regardless of the machine architecture is presented. The proposed model can archive a machine epsilon of about 4.94E-324. A new representation model is given and a complete arithmetic system with basic operations is presented. The accuracy of the proposed method is verified by inverting a high order, Hilbert matrix, an ill-conditioned matrix that cannot be solved in the traditional floating point standard. Finally some comparisons are given.","PeriodicalId":236844,"journal":{"name":"Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACRIM.2011.6032978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Rounding and accumulation of errors when using floating point numbers are important factors in computer arithmetic. Many applications suffer from these problems. The underlying machine architecture and representation of floating point numbers play the major role in the level and value of errors in this type of calculations. A quantitative measure of a system error level is the machine epsilon. In the current representation of floating point numbers, the machine epsilon can be as small as 9.63E-35 in the 128 bit version of IEEE standard floating point representation system. In this work a novel solution that guarantees achieving the desired minimum error regardless of the machine architecture is presented. The proposed model can archive a machine epsilon of about 4.94E-324. A new representation model is given and a complete arithmetic system with basic operations is presented. The accuracy of the proposed method is verified by inverting a high order, Hilbert matrix, an ill-conditioned matrix that cannot be solved in the traditional floating point standard. Finally some comparisons are given.

查看原文本刊更多论文

使用可控制的浮点精度进行精确的浮点运算

使用浮点数时的舍入和误差累加是计算机算术中的重要因素。许多应用程序都存在这些问题。在这种类型的计算中，底层机器架构和浮点数的表示在错误的级别和值中起着主要作用。系统误差水平的定量度量是机器的epsilon。在目前的浮点数表示中，在128位版本的IEEE标准浮点表示系统中，机器epsilon可以小到9.63E-35。在这项工作中，提出了一种新颖的解决方案，无论机器结构如何，都能保证实现所需的最小误差。所提出的模型可以存档约4.94E-324的机器epsilon。给出了一种新的表示模型，并给出了具有基本运算的完整算法体系。通过对一个高阶希尔伯特矩阵(传统浮点标准下无法求解的病态矩阵)进行反演，验证了该方法的准确性。最后进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing

自引率

0.00%

发文量