运行时大规模计算模拟中的误差测量

2020 RIVF International Conference on Computing and Communication Technologies (RIVF) Pub Date : 2020-10-01 DOI:10.1109/RIVF48685.2020.9140785

M. N. Dinh, Q. M. Nguyen

{"title":"运行时大规模计算模拟中的误差测量","authors":"M. N. Dinh, Q. M. Nguyen","doi":"10.1109/RIVF48685.2020.9140785","DOIUrl":null,"url":null,"abstract":"Verification of simulation codes often involves comparing the simulation output behavior to a known model using graphical displays or statistical tests. Such process is challenging for large-scale scientific codes at runtime because they often involve thousands of processes, and generate very large data structures. In our earlier work, we proposed a statistical framework for testing the correctness of large-scale applications using their runtime data. This paper studies the concept of ‘distribution distance’ and establishes the requirements in measuring the runtime differences between a verified stochastic simulation system and its larger-scale counterpart. The paper discusses two types of distribution distance including the χ2 distance and the histogram distance. We prototype the verification methodology and evaluate its performance on two production simulation programs. All experiments were conducted on a 20,000-core Cray XE6.","PeriodicalId":169999,"journal":{"name":"2020 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Measurements of errors in large-scale computational simulations at runtime\",\"authors\":\"M. N. Dinh, Q. M. Nguyen\",\"doi\":\"10.1109/RIVF48685.2020.9140785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Verification of simulation codes often involves comparing the simulation output behavior to a known model using graphical displays or statistical tests. Such process is challenging for large-scale scientific codes at runtime because they often involve thousands of processes, and generate very large data structures. In our earlier work, we proposed a statistical framework for testing the correctness of large-scale applications using their runtime data. This paper studies the concept of ‘distribution distance’ and establishes the requirements in measuring the runtime differences between a verified stochastic simulation system and its larger-scale counterpart. The paper discusses two types of distribution distance including the χ2 distance and the histogram distance. We prototype the verification methodology and evaluate its performance on two production simulation programs. All experiments were conducted on a 20,000-core Cray XE6.\",\"PeriodicalId\":169999,\"journal\":{\"name\":\"2020 RIVF International Conference on Computing and Communication Technologies (RIVF)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 RIVF International Conference on Computing and Communication Technologies (RIVF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RIVF48685.2020.9140785\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 RIVF International Conference on Computing and Communication Technologies (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF48685.2020.9140785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

仿真代码的验证通常涉及使用图形显示或统计测试将仿真输出行为与已知模型进行比较。这种过程对于运行时的大规模科学代码来说是具有挑战性的，因为它们通常涉及数千个过程，并生成非常大的数据结构。在我们早期的工作中，我们提出了一个统计框架，用于使用运行时数据测试大规模应用程序的正确性。本文研究了“分布距离”的概念，并建立了测量经过验证的随机模拟系统与大规模随机模拟系统之间运行时间差异的要求。本文讨论了两种分布距离，即χ2距离和直方图距离。我们对验证方法进行了原型设计，并在两个生产仿真程序上对其性能进行了评估。所有的实验都是在一台两万核的克雷XE6上进行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Measurements of errors in large-scale computational simulations at runtime

Verification of simulation codes often involves comparing the simulation output behavior to a known model using graphical displays or statistical tests. Such process is challenging for large-scale scientific codes at runtime because they often involve thousands of processes, and generate very large data structures. In our earlier work, we proposed a statistical framework for testing the correctness of large-scale applications using their runtime data. This paper studies the concept of ‘distribution distance’ and establishes the requirements in measuring the runtime differences between a verified stochastic simulation system and its larger-scale counterpart. The paper discusses two types of distribution distance including the χ2 distance and the histogram distance. We prototype the verification methodology and evaluate its performance on two production simulation programs. All experiments were conducted on a 20,000-core Cray XE6.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 RIVF International Conference on Computing and Communication Technologies (RIVF)

自引率

0.00%

发文量