Jakob Görgen, Vaastav Anand, Hejing Li, Jialin Li, Antoine Kaufmann
{"title":"Columbo: Low Level End-to-End System Traces through Modular Full-System Simulation","authors":"Jakob Görgen, Vaastav Anand, Hejing Li, Jialin Li, Antoine Kaufmann","doi":"arxiv-2408.05251","DOIUrl":null,"url":null,"abstract":"Fully understanding performance is a growing challenge when building\nnext-generation cloud systems. Often these systems build on next-generation\nhardware, and evaluation in realistic physical testbeds is out of reach. Even\nwhen physical testbeds are available, visibility into essential system aspects\nis a challenge in modern systems where system performance depends on often\nsub-$\\mu s$ interactions between HW and SW components. Existing tools such as\nperformance counters, logging, and distributed tracing provide aggregate or\nsampled information, but remain insufficient for understanding individual\nrequests in-depth. In this paper, we explore a fundamentally different approach\nto enable in-depth understanding of cloud system behavior at the software and\nhardware level, with (almost) arbitrarily fine-grained visibility. Our proposal\nis to run cloud systems in detailed full-system simulations, configure the\nsimulators to collect detailed events without affecting the system, and finally\nassemble these events into end-to-end system traces that can be analyzed by\nexisting distributed tracing tools.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Fully understanding performance is a growing challenge when building
next-generation cloud systems. Often these systems build on next-generation
hardware, and evaluation in realistic physical testbeds is out of reach. Even
when physical testbeds are available, visibility into essential system aspects
is a challenge in modern systems where system performance depends on often
sub-$\mu s$ interactions between HW and SW components. Existing tools such as
performance counters, logging, and distributed tracing provide aggregate or
sampled information, but remain insufficient for understanding individual
requests in-depth. In this paper, we explore a fundamentally different approach
to enable in-depth understanding of cloud system behavior at the software and
hardware level, with (almost) arbitrarily fine-grained visibility. Our proposal
is to run cloud systems in detailed full-system simulations, configure the
simulators to collect detailed events without affecting the system, and finally
assemble these events into end-to-end system traces that can be analyzed by
existing distributed tracing tools.