S. Ha, Wonyong Jeong, Gyorgy Matyasfalvi, C. Xie, K. Huck, J. Choi, A. Malik, Li Tang, H. V. Dam, Line C. Pouchard, W. Xu, Shinjae Yoo, N. D'Imperio, K. K. Dam
{"title":"Chimbuko:一个工作流级可扩展的性能跟踪分析工具","authors":"S. Ha, Wonyong Jeong, Gyorgy Matyasfalvi, C. Xie, K. Huck, J. Choi, A. Malik, Li Tang, H. V. Dam, Line C. Pouchard, W. Xu, Shinjae Yoo, N. D'Imperio, K. K. Dam","doi":"10.1145/3426462.3426465","DOIUrl":null,"url":null,"abstract":"Due to the sheer volume of data it is typically impractical to analyze the detailed performance of an HPC application running at-scale. While conventional small-scale benchmarking and scaling studies are often sufficient for simple applications, many modern workflow-based applications couple multiple elements with competing resource demands and complex inter-communication patterns for which performance cannot easily be studied in isolation and at small scale. This work discusses Chimbuko, a performance analysis framework that provides real-time, in situ anomaly detection. By focusing specifically on performance anomalies and their origin (aka provenance), data volumes are dramatically reduced without losing necessary details. To the best of our knowledge, Chimbuko is the first online, distributed, and scalable workflow-level performance trace analysis framework. We demonstrate the tool’s usefulness on Oak Ridge National Laboratory’s Summit system.","PeriodicalId":320716,"journal":{"name":"ISAV'20 In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool\",\"authors\":\"S. Ha, Wonyong Jeong, Gyorgy Matyasfalvi, C. Xie, K. Huck, J. Choi, A. Malik, Li Tang, H. V. Dam, Line C. Pouchard, W. Xu, Shinjae Yoo, N. D'Imperio, K. K. Dam\",\"doi\":\"10.1145/3426462.3426465\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the sheer volume of data it is typically impractical to analyze the detailed performance of an HPC application running at-scale. While conventional small-scale benchmarking and scaling studies are often sufficient for simple applications, many modern workflow-based applications couple multiple elements with competing resource demands and complex inter-communication patterns for which performance cannot easily be studied in isolation and at small scale. This work discusses Chimbuko, a performance analysis framework that provides real-time, in situ anomaly detection. By focusing specifically on performance anomalies and their origin (aka provenance), data volumes are dramatically reduced without losing necessary details. To the best of our knowledge, Chimbuko is the first online, distributed, and scalable workflow-level performance trace analysis framework. We demonstrate the tool’s usefulness on Oak Ridge National Laboratory’s Summit system.\",\"PeriodicalId\":320716,\"journal\":{\"name\":\"ISAV'20 In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISAV'20 In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3426462.3426465\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISAV'20 In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3426462.3426465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool
Due to the sheer volume of data it is typically impractical to analyze the detailed performance of an HPC application running at-scale. While conventional small-scale benchmarking and scaling studies are often sufficient for simple applications, many modern workflow-based applications couple multiple elements with competing resource demands and complex inter-communication patterns for which performance cannot easily be studied in isolation and at small scale. This work discusses Chimbuko, a performance analysis framework that provides real-time, in situ anomaly detection. By focusing specifically on performance anomalies and their origin (aka provenance), data volumes are dramatically reduced without losing necessary details. To the best of our knowledge, Chimbuko is the first online, distributed, and scalable workflow-level performance trace analysis framework. We demonstrate the tool’s usefulness on Oak Ridge National Laboratory’s Summit system.