M. Hoseini, P. Melo, Manoel Júnior, Fabrício Benevenuto, B. Chandrasekaran, A. Feldmann, Savvas Zannettou
{"title":"Demystifying the Messaging Platforms' Ecosystem Through the Lens of Twitter","authors":"M. Hoseini, P. Melo, Manoel Júnior, Fabrício Benevenuto, B. Chandrasekaran, A. Feldmann, Savvas Zannettou","doi":"10.1145/3419394.3423651","DOIUrl":"https://doi.org/10.1145/3419394.3423651","url":null,"abstract":"Online messaging platforms such as WhatsApp, Telegram, and Discord, each with hundreds of millions of users, are one of the dominant modes of communicating or interacting with one another. Despite the widespread use of public group chats, there exists no systematic or detailed characterization of these group chats. There is, more importantly, lack of a general understanding of how these (public) groups differ in characteristics and use across the different platforms. We also do not know whether the messaging platforms expose personally identifiable information, and we lack a comprehensive view of the privacy implications of leaks for the users. In this work, we address these gaps by analyzing the messaging platforms' ecosystem through the lens of a popular social media platform---Twitter. We search for WhatsApp, Telegram, and Discord group URLs posted on Twitter over a period of 38 days and amass a set of 351K unique group URLs. We analyze the content accompanied by group URLs on Twitter, finding interesting differences related to the topics of the groups across the multiple messaging platforms. By monitoring the characteristics of these groups, every day for more than a month, and, furthermore, by joining a subset of 616 groups across the different messaging platforms, we share key insights into the discovery of these groups via Twitter and reveal how these groups change over time. Finally, we analyze whether messaging platforms expose personally identifiable information. In this paper, we show that (a) Twitter is a rich source for discovering public groups in the different messaging platforms, (b) group URLs from messaging platforms are ephemeral, and (c) the considered messaging platforms expose personally identifiable information, with such leaks being more prevalent on WhatsApp than on Telegram and Discord.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123335881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How the Internet reacted to Covid-19: A perspective from Facebook's Edge Network","authors":"T. Böttger, G. Ibrahim, Ben Vallis","doi":"10.1145/3419394.3423621","DOIUrl":"https://doi.org/10.1145/3419394.3423621","url":null,"abstract":"The Covid-19 pandemic has led to unprecedented changes in the way people interact with each other, which as a consequence has increased pressure on the Internet. In this paper we provide a perspective of the scale of Internet traffic growth and how well the Internet coped with the increased demand as seen from Facebooks edge network. We use this infrastructure serving multiple large social networks and their related family of apps as vantage points to analyze how traffic and product properties changed over the course of the beginning of the Covid-19 pandemic. We show that there have been changes in traffic demand, user behavior and user experience. We also show that different regions of the world saw different magnitudes of impact with predominantly less developed regions exhibiting larger performance degradations.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"62 15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123716798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Who Touched My Browser Fingerprint?: A Large-scale Measurement Study and Classification of Fingerprint Dynamics","authors":"Song Li, Yinzhi Cao","doi":"10.1145/3419394.3423614","DOIUrl":"https://doi.org/10.1145/3419394.3423614","url":null,"abstract":"Browser fingerprints are dynamic, evolving with feature values changed over time. Previous fingerprinting datasets are either small-scale with only thousands of browser instances or without considering fingerprint dynamics. Thus, it remains unclear how an evolution-aware fingerprinting tool behaves in a real-world setting, e.g., on a website with millions of browser instances, let alone how fingerprint dynamics implicate privacy and security. In this paper, we perform the first, large-scale study of millions of fingerprints to analyze fingerprint dynamics in a real-world website. Our measurement study answers the question of how and why fingerprints change over time by classifying fingerprint dynamics into three categories based on their causes. We also observed several insights from our measurement, e.g., we show that state-of-the-art fingerprinting tool performs poorly in terms of F1-Score and matching speed in this real-world setting.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122328483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hiding in Plain Site: Detecting JavaScript Obfuscation through Concealed Browser API Usage","authors":"Shaown Sarker, Jordan Jueckstock, A. Kapravelos","doi":"10.1145/3419394.3423616","DOIUrl":"https://doi.org/10.1145/3419394.3423616","url":null,"abstract":"In this paper, we perform a large-scale measurement study of JavaScript obfuscation of browser APIs in the wild. We rely on a simple, but powerful observation: if dynamic analysis of a script's behavior (specifically, how it interacts with browser APIs) reveals browser API feature usage that cannot be reconciled with static analysis of the script's source code, then that behavior is obfuscated. To quantify and test this observation, we create a hybrid analysis platform using instrumented Chromium to log all browser API accesses by the scripts executed when a user visits a page. We filter the API access traces from our dynamic analysis through a static analysis tool that we developed in order to quantify how much and what kind of functionality is hidden on the web. When applying this methodology across the Alexa top 100k domains, we discover that 95.90% of the domains we successfully visited contain at least one script which invokes APIs that cannot be resolved from static analysis. We observe that eval is no longer the prominent obfuscation method on the web and we uncover families of novel obfuscation techniques that no longer rely on the use of eval.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131295801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arvind Narayanan, Eman Ramadan, Rishabh Mehta, Xinyue Hu, Qingxu Liu, Rostand A. K. Fezeu, Udhaya Kumar Dayalan, Saurabh Verma, Peiqi Ji, Tao Li, Fengqi Qian, Zhi-Li Zhang
{"title":"Lumos5G","authors":"Arvind Narayanan, Eman Ramadan, Rishabh Mehta, Xinyue Hu, Qingxu Liu, Rostand A. K. Fezeu, Udhaya Kumar Dayalan, Saurabh Verma, Peiqi Ji, Tao Li, Fengqi Qian, Zhi-Li Zhang","doi":"10.1145/3419394.3423629","DOIUrl":"https://doi.org/10.1145/3419394.3423629","url":null,"abstract":"The emerging 5G services offer numerous new opportunities for networked applications. In this study, we seek to answer two key questions: i) is the throughput of mmWave 5G predictable, and ii) can we build \"good\" machine learning models for 5G throughput prediction? To this end, we conduct a measurement study of commercial mmWave 5G services in a major U.S. city, focusing on the throughput as perceived by applications running on user equipment (UE). Through extensive experiments and statistical analysis, we identify key UE-side factors that affect 5G performance and quantify to what extent the 5G throughput can be predicted. We then propose Lumos5G -- a composable machine learning (ML) framework that judiciously considers features and their combinations, and apply state-of-the-art ML techniques for making context-aware 5G throughput predictions. We demonstrate that our framework is able to achieve 1.37X to 4.84X reduction in prediction error compared to existing models. Our work can be viewed as a feasibility study for building what we envisage as a dynamic 5G throughput map (akin to Google traffic map). We believe this approach provides opportunities and challenges in building future 5G-aware apps.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114188758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sivaramakrishnan Ramanathan, Anushah Hossain, J. Mirkovic, Minlan Yu, Sadia Afroz
{"title":"Quantifying the Impact of Blocklisting in the Age of Address Reuse","authors":"Sivaramakrishnan Ramanathan, Anushah Hossain, J. Mirkovic, Minlan Yu, Sadia Afroz","doi":"10.1145/3419394.3423657","DOIUrl":"https://doi.org/10.1145/3419394.3423657","url":null,"abstract":"Blocklists, consisting of known malicious IP addresses, can be used as a simple method to block malicious traffic. However, blocklists can potentially lead to unjust blocking of legitimate users due to IP address reuse, where more users could be blocked than intended. IP addresses can be reused either at the same time (Network Address Translation) or over time (dynamic addressing). We propose two new techniques to identify reused addresses. We built a crawler using the BitTorrent Distributed Hash Table to detect NATed addresses and use the RIPE Atlas measurement logs to detect dynamically allocated address spaces. We then analyze 151 publicly available IPv4 blocklists to show the implications of reused addresses and find that 53-60% of blocklists contain reused addresses having about 30.6K-45.1K listings of reused addresses. We also find that reused addresses can potentially affect as many as 78 legitimate users for as many as 44 days.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114295442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caitlin Gray, Clemens Mosig, R. Bush, C. Pelsser, M. Roughan, T. Schmidt, Matthias Wählisch
{"title":"BGP Beacons, Network Tomography, and Bayesian Computation to Locate Route Flap Damping","authors":"Caitlin Gray, Clemens Mosig, R. Bush, C. Pelsser, M. Roughan, T. Schmidt, Matthias Wählisch","doi":"10.1145/3419394.3423624","DOIUrl":"https://doi.org/10.1145/3419394.3423624","url":null,"abstract":"Pinpointing autonomous systems which deploy specific inter-domain techniques such as Route Flap Damping (RFD) or Route Origin Validation (ROV) remains a challenge today. Previous approaches to detect per-AS behavior often relied on heuristics derived from passive and active measurements. Those heuristics, however, often lacked accuracy or imposed tight restrictions on the measurement methods. We introduce an algorithmic framework for network tomography, BeCAUSe, which implements Bayesian Computation for Autonomous Systems. Using our original combination of active probing and stochastic simulation, we present the first study to expose the deployment of RFD. In contrast to the expectation of the Internet community, we find that at least 9% of measured ASs enable RFD, most using deprecated vendor default configuration parameters. To illustrate the power of computational Bayesian methods we compare BeCAUSe with three RFD heuristics. Thereafter we successfully apply a generalization of the Bayesian method to a second challenge, measuring deployment of ROV.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129399349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Singanamalla, E. Jang, Richard J. Anderson, Tadayoshi Kohno, Kurtis Heimerl
{"title":"Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption","authors":"S. Singanamalla, E. Jang, Richard J. Anderson, Tadayoshi Kohno, Kurtis Heimerl","doi":"10.1145/3419394.3423645","DOIUrl":"https://doi.org/10.1145/3419394.3423645","url":null,"abstract":"Across the world, government websites are expected to be reliable sources of information, regardless of their view count. Interactions with these websites often contain sensitive information, such as identity, medical, or legal data, whose integrity must be protected for citizens to remain safe. To better understand the government website ecosystem, we measure the adoption of https including the \"long tail\" of government websites around the world, which are typically not captured in the top-million datasets used for such studies. We identify and measure major categories and frequencies of https adoption errors, including misconfiguration of certificates via expiration, reuse of keys and serial numbers between unrelated government departments, use of insecure cryptographic protocols and keys, and untrustworthy root Certificate Authorities (CAs). Finally, we observe an overall lower https rate and a steeper dropoff with descending popularity among government sites compared to the commercial websites & provide recommendations to improve the usage of https in governments worldwide.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124357212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gautam Akiwate, M. Jonker, Raffaele Sommese, Ian D. Foster, G. Voelker, S. Savage, K. Claffy
{"title":"Unresolved Issues: Prevalence, Persistence, and Perils of Lame Delegations","authors":"Gautam Akiwate, M. Jonker, Raffaele Sommese, Ian D. Foster, G. Voelker, S. Savage, K. Claffy","doi":"10.1145/3419394.3423623","DOIUrl":"https://doi.org/10.1145/3419394.3423623","url":null,"abstract":"The modern Internet relies on the Domain Name System (DNS) to convert between human-readable domain names and IP addresses. However, the correct and efficient implementation of this function is jeopardized when the configuration data binding domains, nameservers and glue records is faulty. In particular lame delegations, which occur when a nameserver responsible for a domain is unable to provide authoritative information about it, introduce both performance and security risks. We perform a broad-based measurement study of lame delegations, using both longitudinal zone data and active querying. We show that lame delegations of various kinds are common (affecting roughly 14% of domains we queried), that they can significantly degrade lookup latency (when they do not lead to outright failure), and that they expose hundreds of thousands of domains to adversarial takeover. We also explore circumstances that give rise to this surprising prevalence of lame delegations, including unforeseen interactions between the operational procedures of registrars and registries.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124487729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Randall, Enze Liu, Gautam Akiwate, R. Padmanabhan, G. Voelker, S. Savage, Aaron Schulman
{"title":"Trufflehunter","authors":"A. Randall, Enze Liu, Gautam Akiwate, R. Padmanabhan, G. Voelker, S. Savage, Aaron Schulman","doi":"10.1145/3419394.3423640","DOIUrl":"https://doi.org/10.1145/3419394.3423640","url":null,"abstract":"This paper presents and evaluates Trufflehunter, a DNS cache snooping tool for estimating the prevalence of rare and sensitive Internet applications. Unlike previous efforts that have focused on small, misconfigured open DNS resolvers, Trufflehunter models the complex behavior of large multi-layer distributed caching infrastructures (e.g., such as Google Public DNS). In particular, using controlled experiments, we have inferred the caching strategies of the four most popular public DNS resolvers (Google Public DNS, Cloudflare Quad1, OpenDNS and Quad9). The large footprint of such resolvers presents an opportunity to observe rare domain usage, while preserving the privacy of the users accessing them. Using a controlled testbed, we evaluate how accurately Trufflehunter can estimate domain name usage across the U.S. Applying this technique in the wild, we provide a lower-bound estimate of the popularity of several rare and sensitive applications (most notably smartphone stalkerware) which are otherwise challenging to survey.","PeriodicalId":255324,"journal":{"name":"Proceedings of the ACM Internet Measurement Conference","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121792360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}