G. Moura, Sebastian Castro, W. Hardaker, M. Wullink, Cristian Hesselman
{"title":"Clouding up the Internet: how centralized is DNS traffic becoming?","authors":"G. Moura, Sebastian Castro, W. Hardaker, M. Wullink, Cristian Hesselman","doi":"10.1145/3419394.3423625","DOIUrl":"https://doi.org/10.1145/3419394.3423625","url":null,"abstract":"Concern has been mounting about Internet centralization over the few last years -- consolidation of traffic/users/infrastructure into the hands of a few market players. We measure DNS and computing centralization by analyzing DNS traffic collected at a DNS root server and two country-code top-level domains (ccTLDs) -- one in Europe and the other in Oceania -- and show evidence of concentration. More than 30% of all queries to both ccTLDs are sent from 5 large cloud providers. We compare the clouds resolver infrastructure and highlight a discrepancy in behavior: some cloud providers heavily employ IPv6, DNSSEC, and DNS over TCP, while others simply use unsecured DNS over UDP over IPv4. We show one positive side to centralization: once a cloud provider deploys a security feature -- such as QNAME minimization -- it quickly benefits a large number of users.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"1116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116063970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Waqar Aqeel, B. Chandrasekaran, A. Feldmann, B. Maggs
{"title":"On Landing and Internal Web Pages: The Strange Case of Jekyll and Hyde in Web Performance Measurement","authors":"Waqar Aqeel, B. Chandrasekaran, A. Feldmann, B. Maggs","doi":"10.1145/3419394.3423626","DOIUrl":"https://doi.org/10.1145/3419394.3423626","url":null,"abstract":"There is a rich body of literature on measuring and optimizing nearly every aspect of the web, including characterizing the structure and content of web pages, devising new techniques to load pages quickly, and evaluating such techniques. Virtually all of this prior work used a single page, namely the landing page (i.e., root document, \"/\"), of each web site as the representative of all pages on that site. In this paper, we characterize the differences between landing and internal (i.e., non-root) pages of 1000 web sites to demonstrate that the structure and content of internal pages differ substantially from those of landing pages, as well as from one another. We review more than a hundred studies published at top-tier networking conferences between 2015 and 2019, and highlight how, in light of these differences, the insights and claims of nearly two-thirds of the relevant studies would need to be revised for them to apply to internal pages. Going forward, we urge the networking community to include internal pages for measuring and optimizing the web. This recommendation, however, poses a non-trivial challenge: How do we select a set of representative internal web pages from a web site? To address the challenge, we have developed Hispar, a \"top list\" of 100,000 pages updated weekly comprising both the landing pages and internal pages of around 2000 web sites. We make Hispar and the tools to recreate or customize it publicly available.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124918090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards A User-Level Understanding of IPv6 Behavior","authors":"Frank H. Li, D. Freeman","doi":"10.1145/3419394.3423618","DOIUrl":"https://doi.org/10.1145/3419394.3423618","url":null,"abstract":"IP address classification and clustering are important tools for security practitioners in understanding attacks and employing proactive defenses. Over the past decade, network providers have begun transitioning from IPv4 to the more flexible IPv6, and a third of users now access online services over IPv6. However, there is no reason to believe that the properties of IPv4 addresses used for security applications should carry over to IPv6, and to date there has not yet been a large-scale study comparing the two protocols at a user (as opposed to a client or address) level. In this paper, we establish empirical grounding on how both ordinary users and attackers use IPv6 in practice, compared with IPv4. Using data on benign and abusive accounts at Facebook, one of the largest online platforms, we conduct user-centric analyses that assess the spatial and temporal properties of users' IP addresses, and IP-centric evaluations that characterize the user populations on IP addresses. We find that compared with IPv4, IPv6 addresses are less populated with users and shorter lived for each user. While both protocols exhibit outlying behavior, we determine that IPv6 outliers are significantly less prevalent and diverse, and more readily predicted. We also study the effects of subnetting IPv6 addresses at different prefix lengths, and find that while /56 subnets are closest in behavior to IPv4 addresses for malicious users, either the full IPv6 address or /64 subnets are most suitable for IP-based security applications, with both providing better performance tradeoffs than IPv4 addresses. Ultimately, our findings provide guidance on how security practitioners can handle IPv6 for applications such as blocklisting, rate limiting, and training machine learning models.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122621609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camelia Simoiu, Ali Zand, Kurt Thomas, Elie Bursztein
{"title":"Who is targeted by email-based phishing and malware?: Measuring factors that differentiate risk","authors":"Camelia Simoiu, Ali Zand, Kurt Thomas, Elie Bursztein","doi":"10.1145/3419394.3423617","DOIUrl":"https://doi.org/10.1145/3419394.3423617","url":null,"abstract":"As technologies to defend against phishing and malware often impose an additional financial and usability cost on users (such as security keys), a question remains as to who should adopt these heightened protections. We measure over 1.2 billion email-based phishing and malware attacks against Gmail users to understand what factors place a person at heightened risk of attack. We find that attack campaigns are typically short-lived and at first glance indiscriminately target users on a global scale. However, by modeling the distribution of targeted users, we find that a person's demographics, location, email usage patterns, and security posture all significantly influence the likelihood of attack. Our findings represent a first step towards empirically identifying the most at-risk users.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122625725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karthika Subramani, Xingzi Yuan, Omid Setayeshfar, Phani Vadrevu, K. H. Lee, R. Perdisci
{"title":"When Push Comes to Ads: Measuring the Rise of (Malicious) Push Advertising","authors":"Karthika Subramani, Xingzi Yuan, Omid Setayeshfar, Phani Vadrevu, K. H. Lee, R. Perdisci","doi":"10.1145/3419394.3423631","DOIUrl":"https://doi.org/10.1145/3419394.3423631","url":null,"abstract":"The rapid growth of online advertising has fueled the growth of ad-blocking software, such as new ad-blocking and privacy-oriented browsers or browser extensions. In response, both ad publishers and ad networks are constantly trying to pursue new strategies to keep up their revenues. To this end, ad networks have started to leverage the Web Push technology enabled by modern web browsers. As web push notifications (WPNs) are relatively new, their role in ad delivery has not yet been studied in depth. Furthermore, it is unclear to what extent WPN ads are being abused for malvertising (i.e., to deliver malicious ads). In this paper, we aim to fill this gap. Specifically, we propose a system called PushAdMiner that is dedicated to (1) automatically registering for and collecting a large number of web-based push notifications from publisher websites, (2) finding WPN-based ads among these notifications, and (3) discovering malicious WPN-based ad campaigns. Using PushAdMiner, we collected and analyzed 21,541 WPN messages by visiting thousands of different websites. Among these, our system identified 572 WPN ad campaigns, for a total of 5,143 WPN-based ads that were pushed by a variety of ad networks. Furthermore, we found that 51% of all WPN ads we collected are malicious, and that traditional ad-blockers and URL filters were mostly unable to block them, thus leaving a significant abuse vector unchecked.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115079046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debopam Bhattacherjee, Waqar Aqeel, G. Laughlin, B. Maggs, Ankit Singla
{"title":"A Bird's Eye View of the World's Fastest Networks","authors":"Debopam Bhattacherjee, Waqar Aqeel, G. Laughlin, B. Maggs, Ankit Singla","doi":"10.1145/3419394.3423620","DOIUrl":"https://doi.org/10.1145/3419394.3423620","url":null,"abstract":"Low latency is of interest for a variety of applications. The most stringent latency requirements arise in financial trading, where sub-microsecond differences matter. As a result, firms in the financial technology sector are pushing networking technology to its limits, giving a peek into the future of consumer-grade terrestrial microwave networks. Here, we explore the world's most competitive network design race, which has played out over the past decade on the Chicago-New Jersey trading corridor. We systematically reconstruct licensed financial trading networks from publicly available information, and examine their latency, path redundancy, wireless link lengths, and operating frequencies.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114250499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"No WAN's Land: Mapping U.S. Broadband Coverage with Millions of Address Queries to ISPs","authors":"David Major, Ross Teixeira, Jonathan R. Mayer","doi":"10.1145/3419394.3423652","DOIUrl":"https://doi.org/10.1145/3419394.3423652","url":null,"abstract":"Accurate broadband coverage data is essential for public policy planning and government support programs. In the United States, the Federal Communications Commission is responsible for maintaining national broadband coverage data. Observers have panned the FCC's broadband maps for overstating availability, due to coarsegrained data collection and a low coverage threshold. We demonstrate a new approach to building broadband coverage maps: automated large-scale queries to the public availability checking tools offered by major internet service providers. We reverse engineer the coverage tools for nine major ISPs in the U.S., test over 19 million residential street addresses across nine states for service, and compare the results to the FCC's maps. Our results demonstrate that the FCC's coverage data significantly overstates the availability of each ISP's service, access to any broadband, connection speeds available to consumers, and competition in broadband markets. We also find that the FCC's data disproportionately overstates coverage in rural and minority communities. Our results highlight a promising direction for developing more accurate broadband maps and validating coverage reports.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114579617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pelayo Vallina, V. Pochat, Álvaro Feal, Marius Paraschiv, Julien Gamba, Tim Burke, O. Hohlfeld, J. Tapiador, N. Vallina-Rodriguez
{"title":"Mis-shapes, Mistakes, Misfits: An Analysis of Domain Classification Services","authors":"Pelayo Vallina, V. Pochat, Álvaro Feal, Marius Paraschiv, Julien Gamba, Tim Burke, O. Hohlfeld, J. Tapiador, N. Vallina-Rodriguez","doi":"10.1145/3419394.3423660","DOIUrl":"https://doi.org/10.1145/3419394.3423660","url":null,"abstract":"Domain classification services have applications in multiple areas, including cybersecurity, content blocking, and targeted advertising. Yet, these services are often a black box in terms of their methodology to classifying domains, which makes it difficult to assess their strengths, aptness for specific applications, and limitations. In this work, we perform a large-scale analysis of 13 popular domain classification services on more than 4.4M hostnames. Our study empirically explores their methodologies, scalability limitations, label constellations, and their suitability to academic research as well as other practical applications such as content filtering. We find that the coverage varies enormously across providers, ranging from over 90% to below 1%. All services deviate from their documented taxonomy, hampering sound usage for research. Further, labels are highly inconsistent across providers, who show little agreement over domains, making it difficult to compare or combine these services. We also show how the dynamics of crowd-sourced efforts may be obstructed by scalability and coverage aspects as well as subjective disagreements among human labelers. Finally, through case studies, we showcase that most services are not fit for detecting specialized content for research or content-blocking purposes. We conclude with actionable recommendations on their usage based on our empirical insights and experience. Particularly, we focus on how users should handle the significant disparities observed across services both in technical solutions and in research.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116518929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Marcos, Lars Prehn, Lucas Leal, A. Dainotti, A. Feldmann, M. Barcellos
{"title":"AS-Path Prepending: there is no rose without a thorn","authors":"P. Marcos, Lars Prehn, Lucas Leal, A. Dainotti, A. Feldmann, M. Barcellos","doi":"10.1145/3419394.3423642","DOIUrl":"https://doi.org/10.1145/3419394.3423642","url":null,"abstract":"Inbound traffic engineering (ITE)---the process of announcing routes to, e.g., maximize revenue or minimize congestion---is an essential task for Autonomous Systems (ASes). AS Path Prepending (ASPP) is an easy to use and well-known ITE technique that routing manuals show as one of the first alternatives to influence other ASes' routing decisions. We observe that origin ASes currently prepend more than 25% of all IPv4 prefixes. ASPP consists of inflating the BGP AS path. Since the length of the AS path is the second tie-breaker in the BGP best path selection, ASPP can steer traffic to other routes. Despite being simple and easy to use, the appreciation of ASPP among operators and researchers is diverse. Some have questioned its need, effectiveness, and predictability, as well as voiced security concerns. Motivated by these mixed views, we revisit ASPP. Our longitudinal study shows that ASes widely deploy ASPP, and its utilization has slightly increased despite public statements against it. We surprisingly spot roughly 6k ASes originating at least one prefix with prepends that achieve no ITE goal. With active measurements, we show that ASPP effectiveness as an ITE tool depends on the AS location and the number of available upstreams; that ASPP security implications are practical; identify that more than 18% of the prepended prefixes contain unnecessary prepends that achieve no apparent goal other than amplifying existing routing security risks. We validate our findings in interviews with 20 network operators.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126365831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Pan, Zhenyu Li, Jianbo Dong, Zheng Cao, Tao Lan, Di Zhang, Gareth Tyson, Gaogang Xie
{"title":"Dissecting the Communication Latency in Distributed Deep Sparse Learning","authors":"H. Pan, Zhenyu Li, Jianbo Dong, Zheng Cao, Tao Lan, Di Zhang, Gareth Tyson, Gaogang Xie","doi":"10.1145/3419394.3423637","DOIUrl":"https://doi.org/10.1145/3419394.3423637","url":null,"abstract":"Distributed deep learning (DDL) uses a cluster of servers to train models in parallel. This has been applied to a multiplicity of problems, e.g. online advertisement, friend recommendations. However, the distribution of training means that the communication network becomes a key component in system performance. In this paper, we measure the Alibaba's DDL system, with a focus on understanding the bottlenecks introduced by the network. Our key finding is that the communications overhead has a surprisingly large impact on performance. To explore this, we analyse latency logs of 1.38M Remote Procedure Calls between servers during model training for two real applications of high-dimensional sparse data. We reveal the major contributors of the latency, including concurrent write/read operations of different connections and network connection management. We further observe a skewed distribution of update frequency for individual parameters, motivating us to propose using in-network computation capacity to offload server tasks.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133637645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}