{"title":"TLS Beyond the Browser: Combining End Host and Network Data to Understand Application Behavior","authors":"Blake Anderson, D. McGrew","doi":"10.1145/3355369.3355601","DOIUrl":"https://doi.org/10.1145/3355369.3355601","url":null,"abstract":"The Transport Layer Security (TLS) protocol has evolved in response to different attacks and is increasingly relied on to secure Internet communications. Web browsers have led the adoption of newer and more secure cryptographic algorithms and protocol versions, and thus improved the security of the TLS ecosystem. Other application categories, however, are increasingly using TLS, but too often are relying on obsolete and insecure protocol options. To understand in detail what applications are using TLS, and how they are using it, we developed a novel system for obtaining process information from end hosts and fusing it with network data to produce a TLS fingerprint knowledge base. This data has a rich set of context for each fingerprint, is representative of enterprise TLS deployments, and is automatically updated from ongoing data collection. Our dataset is based on 471 million endpoint-labeled and 8 billion unlabeled TLS sessions obtained from enterprise edge networks in five countries, plus millions of sessions from a malware analysis sandbox. We actively maintain an open source dataset that, at 4,500+ fingerprints and counting, is both the largest and most informative ever published. In this paper, we use the knowledge base to identify trends in enterprise TLS applications beyond the browser: application categories such as storage, communication, system, and email. We identify a rise in the use of TLS by nonbrowser applications and a corresponding decline in the fraction of sessions using version 1.3. Finally, we highlight the shortcomings of naïvely applying TLS fingerprinting to detect malware, and we present recent trends in malware's use of TLS such as the adoption of cipher suite randomization.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75877321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Look at the ECS Behavior of DNS Resolvers","authors":"R. Al-Dalky, M. Rabinovich, Kyle Schomp","doi":"10.1145/3355369.3355586","DOIUrl":"https://doi.org/10.1145/3355369.3355586","url":null,"abstract":"Content delivery networks (CDNs) commonly use DNS to map end-users to the best edge servers. A recently proposed EDNS0-Client-Subnet (ECS) extension allows recursive resolvers to include end-user subnet information in DNS queries, so that authoritative DNS servers, especially those belonging to CDNs, could use this information to improve user mapping. In this paper, we study the ECS behavior of ECS-enabled recursive resolvers from the perspectives of the opposite sides of a DNS interaction, the authoritative DNS servers of a major CDN and a busy DNS resolution service. We find a range of erroneous (i.e., deviating from the protocol specification) and detrimental (even if compliant) behaviors that may unnecessarily erode client privacy, reduce the effectiveness of DNS caching, diminish ECS benefits, and in some cases turn ECS from facilitator into an obstacle to authoritative DNS servers' ability to optimize user-to-edge-server mappings.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73330591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VisibleV8","authors":"Jordan Jueckstock, A. Kapravelos","doi":"10.1145/3355369.3355599","DOIUrl":"https://doi.org/10.1145/3355369.3355599","url":null,"abstract":"Modern web security and privacy research depends on accurate measurement of an often evasive and hostile web. No longer just a network of static, hyperlinked documents, the modern web is alive with JavaScript (JS) loaded from third parties of unknown trustworthiness. Dynamic analysis of potentially hostile JS currently presents a cruel dilemma: use heavyweight in-browser solutions that prove impossible to maintain, or use lightweight inline JS solutions that are detectable by evasive JS and which cannot match the scope of coverage provided by in-browser systems. We present VisibleV8, a dynamic analysis framework hosted inside V8, the JS engine of the Chrome browser, that logs native function or property accesses during any JS execution. At less than 600 lines (only 67 of which modify V8's existing behavior), our patches are lightweight and have been maintained from Chrome versions 63 through 72 without difficulty. VV8 consistently outperforms equivalent inline instrumentation, and it intercepts accesses impossible to instrument inline. This comprehensive coverage allows us to isolate and identify 46 JavaScript namespace artifacts used by JS code in the wild to detect automated browsing platforms and to discover that 29% of the Alexa top 50k sites load content which actively probes these artifacts.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"574 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77079075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Opening the Blackbox of VirusTotal: Analyzing Online Phishing Scan Engines","authors":"Peng Peng, Limin Yang, Linhai Song, Gang Wang","doi":"10.1145/3355369.3355585","DOIUrl":"https://doi.org/10.1145/3355369.3355585","url":null,"abstract":"Online scan engines such as VirusTotal are heavily used by researchers to label malicious URLs and files. Unfortunately, it is not well understood how the labels are generated and how reliable the scanning results are. In this paper, we focus on VirusTotal and its 68 third-party vendors to examine their labeling process on phishing URLs. We perform a series of measurements by setting up our own phishing websites (mimicking PayPal and IRS) and submitting the URLs for scanning. By analyzing the incoming network traffic and the dynamic label changes at VirusTotal, we reveal new insights into how VirusTotal works and the quality of their labels. Among other things, we show that vendors have trouble flagging all phishing sites, and even the best vendors missed 30% of our phishing sites. In addition, the scanning results are not immediately updated to VirusTotal after the scanning, and there are inconsistent results between VirusTotal scan and some vendors' own scanners. Our results reveal the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86094158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Louis F. DeKoven, A. Randall, A. Mirian, Gautam Akiwate, Ansel Blume, L. Saul, Aaron Schulman, G. Voelker, S. Savage
{"title":"Measuring Security Practices and How They Impact Security","authors":"Louis F. DeKoven, A. Randall, A. Mirian, Gautam Akiwate, Ansel Blume, L. Saul, Aaron Schulman, G. Voelker, S. Savage","doi":"10.1145/3355369.3355571","DOIUrl":"https://doi.org/10.1145/3355369.3355571","url":null,"abstract":"Security is a discipline that places significant expectations on lay users. Thus, there are a wide array of technologies and behaviors that we exhort end users to adopt and thereby reduce their security risk. However, the adoption of these \"best practices\" --- ranging from the use of antivirus products to actively keeping software updated --- is not well understood, nor is their practical impact on security risk well-established. This paper explores both of these issues via a large-scale empirical measurement study covering approximately 15,000 computers over six months. We use passive monitoring to infer and characterize the prevalence of various security practices in situ as well as a range of other potentially security-relevant behaviors. We then explore the extent to which differences in key security behaviors impact real-world outcomes (i.e., that a device shows clear evidence of having been compromised).","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84790312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaoyi Lu, Baojun Liu, Zhou Li, S. Hao, Haixin Duan, Mingming Zhang, Chunying Leng, Y. Liu, Zaifeng Zhang, Jianping Wu
{"title":"An End-to-End, Large-Scale Measurement of DNS-over-Encryption: How Far Have We Come?","authors":"Chaoyi Lu, Baojun Liu, Zhou Li, S. Hao, Haixin Duan, Mingming Zhang, Chunying Leng, Y. Liu, Zaifeng Zhang, Jianping Wu","doi":"10.1145/3355369.3355580","DOIUrl":"https://doi.org/10.1145/3355369.3355580","url":null,"abstract":"DNS packets are designed to travel in unencrypted form through the Internet based on its initial standard. Recent discoveries show that real-world adversaries are actively exploiting this design vulnerability to compromise Internet users' security and privacy. To mitigate such threats, several protocols have been proposed to encrypt DNS queries between DNS clients and servers, which we jointly term as DNS-over-Encryption. While some proposals have been standardized and are gaining strong support from the industry, little has been done to understand their status from the view of global users. This paper performs by far the first end-to-end and large-scale analysis on DNS-over-Encryption. By collecting data from Internet scanning, user-end measurement and passive monitoring logs, we have gained several unique insights. In general, the service quality of DNS-over-Encryption is satisfying, in terms of accessibility and latency. For DNS clients, DNS-over-Encryption queries are less likely to be disrupted by in-path interception compared to traditional DNS, and the extra overhead is tolerable. However, we also discover several issues regarding how the services are operated. As an example, we find 25% DNS-over-TLS service providers use invalid SSL certificates. Compared to traditional DNS, DNS-over-Encryption is used by far fewer users but we have witnessed a growing trend. As such, we believe the community should push broader adoption of DNS-over-Encryption and we also suggest the service providers carefully review their implementations.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84452108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Pastrana, Alice Hutchings, Daniel R. Thomas, J. Tapiador
{"title":"Measuring eWhoring","authors":"S. Pastrana, Alice Hutchings, Daniel R. Thomas, J. Tapiador","doi":"10.1145/3355369.3355597","DOIUrl":"https://doi.org/10.1145/3355369.3355597","url":null,"abstract":"eWhoring is the term used by offenders to refer to a type of online fraud in which cybersexual encounters are simulated for financial gain. Perpetrators use social engineering techniques to impersonate young women in online communities, e.g., chat or social networking sites. They engage potential customers in conversation with the aim of selling misleading sexual material -- mostly photographs and interactive video shows -- illicitly compiled from third-party sites. eWhoring is a popular topic in underground communities, with forums acting as a gateway into offending. Users not only share knowledge and tutorials, but also trade in goods and services, such as packs of images and videos. In this paper, we present a processing pipeline to quantitatively analyse various aspects of eWhoring. Our pipeline integrates multiple tools to crawl, annotate, and classify material in a semi-automatic way. It builds in precautions to safeguard against significant ethical issues, such as avoiding the researchers' exposure to pornographic material, and legal concerns, which were justified as some of the images were classified as child exploitation material. We use it to perform a longitudinal measurement of eWhoring activities in 10 specialised underground forums from 2008 to 2019. Our study focuses on three of the main eWhoring components: (i) the acquisition and provenance of images; (ii) the financial profits and monetisation techniques; and (iii) a social network analysis of the offenders, including their relationships, interests, and pathways before and after engaging in this fraudulent activity. We provide recommendations, including potential intervention approaches.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"505 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76394274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scanning the Scanners: Sensing the Internet from a Massively Distributed Network Telescope","authors":"P. Richter, A. Berger","doi":"10.1145/3355369.3355595","DOIUrl":"https://doi.org/10.1145/3355369.3355595","url":null,"abstract":"Scanning of hosts on the Internet to identify vulnerable devices and services is a key component in many of today's cyberattacks. Tracking this scanning activity, in turn, provides an excellent signal to assess the current state-of-affairs for many vulnerabilities and their exploitation. So far, studies tracking scanning activity have relied on unsolicited traffic captured in darknets, focusing on random scans of the address space. In this work, we track scanning activity through the lens of unsolicited traffic captured at the firewalls of some 89,000 hosts of a major Content Distribution Network (CDN). Our vantage point has two distinguishing features compared to darknets: (i) it is distributed across some 1,300 networks, and (ii) its servers are live, offering services and thus emitting traffic. While all servers receive a baseline level of probing from Internet-wide scans, i.e., scans targeting random subsets of or the entire IPv4 space, we show that some 30% of all logged scan traffic is the result of localized scans. We find that localized scanning campaigns often target narrow regions in the address space, and that their characteristics in terms of target selection strategy and scanned services differ vastly from the more widely known Internet-wide scans. Our observations imply that conventional darknets can only partially illuminate scanning activity, and may severely underestimate widespread attempts to scan and exploit individual services in specific prefixes or networks. Our methods can be adapted for individual network operators to assess if they are subjected to targeted scanning activity.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74631827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cache Me If You Can: Effects of DNS Time-to-Live","authors":"G. Moura, J. Heidemann, R. Schmidt, W. Hardaker","doi":"10.1145/3355369.3355568","DOIUrl":"https://doi.org/10.1145/3355369.3355568","url":null,"abstract":"DNS depends on extensive caching for good performance, and every DNS zone owner must set Time-to-Live (TTL) values to control their DNS caching. Today there is relatively little guidance backed by research about how to set TTLs, and operators must balance conflicting demands of caching against agility of configuration. Exactly how TTL value choices affect operational networks is quite challenging to understand due to interactions across the distributed DNS service, where resolvers receive TTLs in different ways (answers and hints), TTLs are specified in multiple places (zones and their parent's glue), and while DNS resolution must be security-aware. This paper provides the first careful evaluation of how these multiple, interacting factors affect the effective cache lifetimes of DNS records, and provides recommendations for how to configure DNS TTLs based on our findings. We provide recommendations in TTL choice for different situations, and for where they must be configured. We show that longer TTLs have significant promise in reducing latency, reducing it from 183 ms to 28.7 ms for one country-code TLD.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"187 2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81092462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Santiago Vargas, U. Goel, Moritz Steiner, A. Balasubramanian
{"title":"Characterizing JSON Traffic Patterns on a CDN","authors":"Santiago Vargas, U. Goel, Moritz Steiner, A. Balasubramanian","doi":"10.1145/3355369.3355594","DOIUrl":"https://doi.org/10.1145/3355369.3355594","url":null,"abstract":"Content delivery networks serve a major fraction of the Internet traffic, and their geographically deployed infrastructure makes them a good vantage point to observe traffic access patterns. We perform a large-scale investigation to characterize Web traffic patterns observed from a major CDN infrastructure. Specifically, we discover that responses with application/json content-type form a growing majority of all HTTP requests. As a result, we seek to understand what types of devices and applications are requesting JSON objects and explore opportunities to optimize CDN delivery of JSON traffic. Our study shows that mobile applications account for at least 52% of JSON traffic on the CDN and embedded devices account for another 12% of all JSON traffic. We also find that more than 55% of JSON traffic on the CDN is uncacheable, showing that a large portion of JSON traffic on the CDN is dynamic. By further looking at patterns of periodicity in requests, we find that 6.3% of JSON traffic is periodically requested and reflects the use of (partially) autonomous software systems, IoT devices, and other kinds of machine-to-machine communication. Finally, we explore dependencies in JSON traffic through the lens of ngram models and find that these models can capture patterns between subsequent requests. We can potentially leverage this to prefetch requests, improving the cache hit ratio.","PeriodicalId":20640,"journal":{"name":"Proceedings of the Internet Measurement Conference 2018","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87056063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}