Jakob Görgen, Vaastav Anand, Hejing Li, Jialin Li, Antoine Kaufmann
{"title":"科伦坡通过模块化全系统仿真进行低水平端到端系统跟踪","authors":"Jakob Görgen, Vaastav Anand, Hejing Li, Jialin Li, Antoine Kaufmann","doi":"arxiv-2408.05251","DOIUrl":null,"url":null,"abstract":"Fully understanding performance is a growing challenge when building\nnext-generation cloud systems. Often these systems build on next-generation\nhardware, and evaluation in realistic physical testbeds is out of reach. Even\nwhen physical testbeds are available, visibility into essential system aspects\nis a challenge in modern systems where system performance depends on often\nsub-$\\mu s$ interactions between HW and SW components. Existing tools such as\nperformance counters, logging, and distributed tracing provide aggregate or\nsampled information, but remain insufficient for understanding individual\nrequests in-depth. In this paper, we explore a fundamentally different approach\nto enable in-depth understanding of cloud system behavior at the software and\nhardware level, with (almost) arbitrarily fine-grained visibility. Our proposal\nis to run cloud systems in detailed full-system simulations, configure the\nsimulators to collect detailed events without affecting the system, and finally\nassemble these events into end-to-end system traces that can be analyzed by\nexisting distributed tracing tools.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Columbo: Low Level End-to-End System Traces through Modular Full-System Simulation\",\"authors\":\"Jakob Görgen, Vaastav Anand, Hejing Li, Jialin Li, Antoine Kaufmann\",\"doi\":\"arxiv-2408.05251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fully understanding performance is a growing challenge when building\\nnext-generation cloud systems. Often these systems build on next-generation\\nhardware, and evaluation in realistic physical testbeds is out of reach. Even\\nwhen physical testbeds are available, visibility into essential system aspects\\nis a challenge in modern systems where system performance depends on often\\nsub-$\\\\mu s$ interactions between HW and SW components. Existing tools such as\\nperformance counters, logging, and distributed tracing provide aggregate or\\nsampled information, but remain insufficient for understanding individual\\nrequests in-depth. In this paper, we explore a fundamentally different approach\\nto enable in-depth understanding of cloud system behavior at the software and\\nhardware level, with (almost) arbitrarily fine-grained visibility. Our proposal\\nis to run cloud systems in detailed full-system simulations, configure the\\nsimulators to collect detailed events without affecting the system, and finally\\nassemble these events into end-to-end system traces that can be analyzed by\\nexisting distributed tracing tools.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.05251\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Columbo: Low Level End-to-End System Traces through Modular Full-System Simulation
Fully understanding performance is a growing challenge when building
next-generation cloud systems. Often these systems build on next-generation
hardware, and evaluation in realistic physical testbeds is out of reach. Even
when physical testbeds are available, visibility into essential system aspects
is a challenge in modern systems where system performance depends on often
sub-$\mu s$ interactions between HW and SW components. Existing tools such as
performance counters, logging, and distributed tracing provide aggregate or
sampled information, but remain insufficient for understanding individual
requests in-depth. In this paper, we explore a fundamentally different approach
to enable in-depth understanding of cloud system behavior at the software and
hardware level, with (almost) arbitrarily fine-grained visibility. Our proposal
is to run cloud systems in detailed full-system simulations, configure the
simulators to collect detailed events without affecting the system, and finally
assemble these events into end-to-end system traces that can be analyzed by
existing distributed tracing tools.