{"title":"运行时大规模计算模拟中的误差测量","authors":"M. N. Dinh, Q. M. Nguyen","doi":"10.1109/RIVF48685.2020.9140785","DOIUrl":null,"url":null,"abstract":"Verification of simulation codes often involves comparing the simulation output behavior to a known model using graphical displays or statistical tests. Such process is challenging for large-scale scientific codes at runtime because they often involve thousands of processes, and generate very large data structures. In our earlier work, we proposed a statistical framework for testing the correctness of large-scale applications using their runtime data. This paper studies the concept of ‘distribution distance’ and establishes the requirements in measuring the runtime differences between a verified stochastic simulation system and its larger-scale counterpart. The paper discusses two types of distribution distance including the χ2 distance and the histogram distance. We prototype the verification methodology and evaluate its performance on two production simulation programs. All experiments were conducted on a 20,000-core Cray XE6.","PeriodicalId":169999,"journal":{"name":"2020 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Measurements of errors in large-scale computational simulations at runtime\",\"authors\":\"M. N. Dinh, Q. M. Nguyen\",\"doi\":\"10.1109/RIVF48685.2020.9140785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Verification of simulation codes often involves comparing the simulation output behavior to a known model using graphical displays or statistical tests. Such process is challenging for large-scale scientific codes at runtime because they often involve thousands of processes, and generate very large data structures. In our earlier work, we proposed a statistical framework for testing the correctness of large-scale applications using their runtime data. This paper studies the concept of ‘distribution distance’ and establishes the requirements in measuring the runtime differences between a verified stochastic simulation system and its larger-scale counterpart. The paper discusses two types of distribution distance including the χ2 distance and the histogram distance. We prototype the verification methodology and evaluate its performance on two production simulation programs. All experiments were conducted on a 20,000-core Cray XE6.\",\"PeriodicalId\":169999,\"journal\":{\"name\":\"2020 RIVF International Conference on Computing and Communication Technologies (RIVF)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 RIVF International Conference on Computing and Communication Technologies (RIVF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RIVF48685.2020.9140785\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 RIVF International Conference on Computing and Communication Technologies (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF48685.2020.9140785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Measurements of errors in large-scale computational simulations at runtime
Verification of simulation codes often involves comparing the simulation output behavior to a known model using graphical displays or statistical tests. Such process is challenging for large-scale scientific codes at runtime because they often involve thousands of processes, and generate very large data structures. In our earlier work, we proposed a statistical framework for testing the correctness of large-scale applications using their runtime data. This paper studies the concept of ‘distribution distance’ and establishes the requirements in measuring the runtime differences between a verified stochastic simulation system and its larger-scale counterpart. The paper discusses two types of distribution distance including the χ2 distance and the histogram distance. We prototype the verification methodology and evaluate its performance on two production simulation programs. All experiments were conducted on a 20,000-core Cray XE6.