Dorra Ben Khalifa, Xinyi Li, I. Laguna, M. Martel, G. Gopalakrishnan
{"title":"增加对百亿亿次模拟的信任","authors":"Dorra Ben Khalifa, Xinyi Li, I. Laguna, M. Martel, G. Gopalakrishnan","doi":"10.1109/XLOOP56614.2022.00010","DOIUrl":null,"url":null,"abstract":"In recent decades, High Performance Computing (HPC) and simulations have become determinant in many areas of engineering and science. Since many HPC applications rely extensively on floating-point arithmetic operations to solve computational problems, many kinds of numerical errors can be introduced during the program execution, leading to instability or reproducibility problems. One kind of these error sources is the loss of significant digits or cancellation which produces inaccurate results when two nearby numbers are subtracted. In this article, we present Candy, a new dynamic library based on code instrumentation that detects cancellations in numerical software. The originality of our method is to compute the number of significant bits of floating-point numbers in a generalized framework by attaching a shadow value in higher precision to each number. This helps to detect in an accurate way if a program suffers from cancellation problems and thus to increase the trust in large-scale HPC applications and exascale simulations. We evaluate Candy over a set of complex and real-world numerical applications. In addition, we compare our method against the state-of-art tool FPChecker in terms of efficiency, mixed precision results and speed of the analysis.","PeriodicalId":401106,"journal":{"name":"2022 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Toward Increasing Trust in Exascale Simulations\",\"authors\":\"Dorra Ben Khalifa, Xinyi Li, I. Laguna, M. Martel, G. Gopalakrishnan\",\"doi\":\"10.1109/XLOOP56614.2022.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent decades, High Performance Computing (HPC) and simulations have become determinant in many areas of engineering and science. Since many HPC applications rely extensively on floating-point arithmetic operations to solve computational problems, many kinds of numerical errors can be introduced during the program execution, leading to instability or reproducibility problems. One kind of these error sources is the loss of significant digits or cancellation which produces inaccurate results when two nearby numbers are subtracted. In this article, we present Candy, a new dynamic library based on code instrumentation that detects cancellations in numerical software. The originality of our method is to compute the number of significant bits of floating-point numbers in a generalized framework by attaching a shadow value in higher precision to each number. This helps to detect in an accurate way if a program suffers from cancellation problems and thus to increase the trust in large-scale HPC applications and exascale simulations. We evaluate Candy over a set of complex and real-world numerical applications. In addition, we compare our method against the state-of-art tool FPChecker in terms of efficiency, mixed precision results and speed of the analysis.\",\"PeriodicalId\":401106,\"journal\":{\"name\":\"2022 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/XLOOP56614.2022.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/XLOOP56614.2022.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In recent decades, High Performance Computing (HPC) and simulations have become determinant in many areas of engineering and science. Since many HPC applications rely extensively on floating-point arithmetic operations to solve computational problems, many kinds of numerical errors can be introduced during the program execution, leading to instability or reproducibility problems. One kind of these error sources is the loss of significant digits or cancellation which produces inaccurate results when two nearby numbers are subtracted. In this article, we present Candy, a new dynamic library based on code instrumentation that detects cancellations in numerical software. The originality of our method is to compute the number of significant bits of floating-point numbers in a generalized framework by attaching a shadow value in higher precision to each number. This helps to detect in an accurate way if a program suffers from cancellation problems and thus to increase the trust in large-scale HPC applications and exascale simulations. We evaluate Candy over a set of complex and real-world numerical applications. In addition, we compare our method against the state-of-art tool FPChecker in terms of efficiency, mixed precision results and speed of the analysis.