Zhihong Zhang, Dan Meng, Jianfeng Zhan, Lei Wang, Yi Jin, Yu Wen, Hui Wang
{"title":"Gingko: correlating causal paths in distributed systems","authors":"Zhihong Zhang, Dan Meng, Jianfeng Zhan, Lei Wang, Yi Jin, Yu Wen, Hui Wang","doi":"10.1109/NPC.2007.46","DOIUrl":null,"url":null,"abstract":"Many large-scale systems are distributed systems of multiple communicating components. Finding causal paths of message traces between components throughout these systems is important to uncover runtime behaviors and identify the root cause of failures, but this \"art\" often hides in the heads of developers or domain experts. Our goal is to design tools and algorithms to help developers record this art into logs and help the modestly-skilled users and system administrators master it to make better use and management of distributed systems. In this paper, we present a methodology that automatically builds the causal paths of message traces by 1) an agreement with programmers on the style and content of logs produced by operational distributed systems they develop and 2) a correlation algorithm to build message causal paths with the clues from these logs. To validate this mechanism, we have implemented Gingko, a prototype providing a tool chain for users to gain better comprehensions of distributed systems and to debug them efficiently when errors happen.","PeriodicalId":278518,"journal":{"name":"2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NPC.2007.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Many large-scale systems are distributed systems of multiple communicating components. Finding causal paths of message traces between components throughout these systems is important to uncover runtime behaviors and identify the root cause of failures, but this "art" often hides in the heads of developers or domain experts. Our goal is to design tools and algorithms to help developers record this art into logs and help the modestly-skilled users and system administrators master it to make better use and management of distributed systems. In this paper, we present a methodology that automatically builds the causal paths of message traces by 1) an agreement with programmers on the style and content of logs produced by operational distributed systems they develop and 2) a correlation algorithm to build message causal paths with the clues from these logs. To validate this mechanism, we have implemented Gingko, a prototype providing a tool chain for users to gain better comprehensions of distributed systems and to debug them efficiently when errors happen.