Toward accurate and practical network tomography

ACM SIGOPS Oper. Syst. Rev. Pub Date : 2013-01-29 DOI:10.1145/2433140.2433146

Denisa Ghita, K. Argyraki, Patrick Thiran

{"title":"Toward accurate and practical network tomography","authors":"Denisa Ghita, K. Argyraki, Patrick Thiran","doi":"10.1145/2433140.2433146","DOIUrl":null,"url":null,"abstract":"Troubleshooting large networks is hard; when an end-user complains that she has “network problems,” there is typically a large number of possible causes. For example, the end-user’s own machine may be damaged, misconfigured, or compromised, a network element that handles her traffic may be congested or malfunctioning, or the destination she is trying to reach may be filtering her traffic. To diagnose such problems, a network operator normally has to probe the network’s elements to collect relevant statistics, like packet loss or bandwidth utilization. The challenge, though, is that the network operator often does not have direct access to all the suspected network elements, hence cannot probe them— e.g., the operator of an edge network does not have access to the equipment of her Internet service provider (ISP). Network tomography is an elegant approach to network troubleshooting: just as medical tomography observes an organ from different vantage points and combines the observations to get knowledge of the organ’s internals (without dissecting it), so does network tomography observe the characteristics of different end-to-end network paths and combines the observations to infer the characteristics of individual network links (without probing them). This approach is applicable in scenarios where one needs to monitor the behavior and performance of a network without having direct access to its elements. For instance, the operators of edge networks could use network tomography to monitor the behavior and performance of their ISPs; an ISP operator could use it to monitor the behavior and performance of its peers. However, there are reasons to be skeptical about the usefulness of network tomography in practice. Even though it was invented more than 10 years ago and is still a topic of active research, it has not seen any real deployment. We believe the reason is that existing tomography algorithmsmake certain simplifying assumptions that do not always hold in a real network, which means that the algorithms’ results may be inaccurate. Most importantly, there is no way to determine the extent of this inaccuracy. In other words, today there is no way for a network operator who employs tomography for network troubleshooting to compute the certainty of its diagnosis.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"143 1","pages":"22-26"},"PeriodicalIF":0.0000,"publicationDate":"2013-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGOPS Oper. Syst. Rev.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2433140.2433146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Troubleshooting large networks is hard; when an end-user complains that she has “network problems,” there is typically a large number of possible causes. For example, the end-user’s own machine may be damaged, misconfigured, or compromised, a network element that handles her traffic may be congested or malfunctioning, or the destination she is trying to reach may be filtering her traffic. To diagnose such problems, a network operator normally has to probe the network’s elements to collect relevant statistics, like packet loss or bandwidth utilization. The challenge, though, is that the network operator often does not have direct access to all the suspected network elements, hence cannot probe them— e.g., the operator of an edge network does not have access to the equipment of her Internet service provider (ISP). Network tomography is an elegant approach to network troubleshooting: just as medical tomography observes an organ from different vantage points and combines the observations to get knowledge of the organ’s internals (without dissecting it), so does network tomography observe the characteristics of different end-to-end network paths and combines the observations to infer the characteristics of individual network links (without probing them). This approach is applicable in scenarios where one needs to monitor the behavior and performance of a network without having direct access to its elements. For instance, the operators of edge networks could use network tomography to monitor the behavior and performance of their ISPs; an ISP operator could use it to monitor the behavior and performance of its peers. However, there are reasons to be skeptical about the usefulness of network tomography in practice. Even though it was invented more than 10 years ago and is still a topic of active research, it has not seen any real deployment. We believe the reason is that existing tomography algorithmsmake certain simplifying assumptions that do not always hold in a real network, which means that the algorithms’ results may be inaccurate. Most importantly, there is no way to determine the extent of this inaccuracy. In other words, today there is no way for a network operator who employs tomography for network troubleshooting to compute the certainty of its diagnosis.

查看原文本刊更多论文

走向准确实用的网络断层扫描

对大型网络进行故障排除很困难;当终端用户抱怨她有“网络问题”时，通常有很多可能的原因。例如，最终用户自己的机器可能损坏、配置错误或受到损害，处理其通信的网络元素可能拥塞或发生故障，或者她试图到达的目的地可能正在过滤其通信。为了诊断这些问题，网络运营商通常必须探测网络的元素来收集相关的统计数据，比如丢包或带宽利用率。然而，挑战在于网络运营商通常无法直接访问所有可疑的网络元素，因此无法探测它们——例如，边缘网络的运营商无法访问其互联网服务提供商(ISP)的设备。网络断层扫描是网络故障排除的一种优雅方法:正如医学断层扫描从不同的有利位置观察一个器官，并结合观察结果来获得器官内部的知识(不解剖它)一样，网络断层扫描观察不同的端到端网络路径的特征，并结合观察结果来推断单个网络链路的特征(不探测它们)。这种方法适用于需要监视网络的行为和性能，而不能直接访问其元素的场景。例如，边缘网络的运营商可以使用网络断层扫描来监控其isp的行为和性能;ISP运营商可以使用它来监控其对等网络的行为和性能。然而，有理由怀疑网络断层扫描在实践中的实用性。尽管它是在10多年前发明的，并且仍然是一个活跃的研究课题，但它还没有看到任何真正的部署。我们认为原因是现有的断层扫描算法做出了某些简化的假设，这些假设并不总是适用于真实的网络，这意味着算法的结果可能是不准确的。最重要的是，没有办法确定这种不准确的程度。换句话说，今天对于使用断层扫描进行网络故障排除的网络操作员来说，没有办法计算其诊断的确定性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM SIGOPS Oper. Syst. Rev.

自引率

0.00%

发文量