A. M. Zaki, Ayman M. Bahaa-Eldin, M. H. El-Shafey, G. Aly
{"title":"Accurate floating-point operation using controlled floating-point precision","authors":"A. M. Zaki, Ayman M. Bahaa-Eldin, M. H. El-Shafey, G. Aly","doi":"10.1109/PACRIM.2011.6032978","DOIUrl":null,"url":null,"abstract":"Rounding and accumulation of errors when using floating point numbers are important factors in computer arithmetic. Many applications suffer from these problems. The underlying machine architecture and representation of floating point numbers play the major role in the level and value of errors in this type of calculations. A quantitative measure of a system error level is the machine epsilon. In the current representation of floating point numbers, the machine epsilon can be as small as 9.63E-35 in the 128 bit version of IEEE standard floating point representation system. In this work a novel solution that guarantees achieving the desired minimum error regardless of the machine architecture is presented. The proposed model can archive a machine epsilon of about 4.94E-324. A new representation model is given and a complete arithmetic system with basic operations is presented. The accuracy of the proposed method is verified by inverting a high order, Hilbert matrix, an ill-conditioned matrix that cannot be solved in the traditional floating point standard. Finally some comparisons are given.","PeriodicalId":236844,"journal":{"name":"Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACRIM.2011.6032978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Rounding and accumulation of errors when using floating point numbers are important factors in computer arithmetic. Many applications suffer from these problems. The underlying machine architecture and representation of floating point numbers play the major role in the level and value of errors in this type of calculations. A quantitative measure of a system error level is the machine epsilon. In the current representation of floating point numbers, the machine epsilon can be as small as 9.63E-35 in the 128 bit version of IEEE standard floating point representation system. In this work a novel solution that guarantees achieving the desired minimum error regardless of the machine architecture is presented. The proposed model can archive a machine epsilon of about 4.94E-324. A new representation model is given and a complete arithmetic system with basic operations is presented. The accuracy of the proposed method is verified by inverting a high order, Hilbert matrix, an ill-conditioned matrix that cannot be solved in the traditional floating point standard. Finally some comparisons are given.