{"title":"On the capacity of thermal covert channels in multicores","authors":"D. Bartolini, Philipp Miedl, L. Thiele","doi":"10.1145/2901318.2901322","DOIUrl":"https://doi.org/10.1145/2901318.2901322","url":null,"abstract":"Modern multicore processors feature easily accessible temperature sensors that provide useful information for dynamic thermal management. These sensors were recently shown to be a potential security threat, since otherwise isolated applications can exploit them to establish a thermal covert channel and leak restricted information. Previous research showed experiments that document the feasibility of (low-rate) communication over this channel, but did not further analyze its fundamental characteristics. For this reason, the important questions of quantifying the channel capacity and achievable rates remain unanswered. To address these questions, we devise and exploit a new methodology that leverages both theoretical results from information theory and experimental data to study these thermal covert channels on modern multicores. We use spectral techniques to analyze data from two representative platforms and estimate the capacity of the channels from a source application to temperature sensors on the same or different cores. We estimate the capacity to be in the order of 300 bits per second (bps) for the same-core channel, i.e., when reading the temperature on the same core where the source application runs, and in the order of 50 bps for the 1-hop channel, i.e., when reading the temperature of the core physically next to the one where the source application runs. Moreover, we show a communication scheme that achieves rates of more than 45 bps on the same-core channel and more than 5 bps on the 1-hop channel, with less than 1% error probability. The highest rate shown in previous work was 1.33 bps on the 1-hop channel with 11% error probability.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90266094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Yoda: a highly available layer-7 load balancer","authors":"Rohan Gandhi, Charlie Hu, Ming Zhang","doi":"10.1145/2901318.2901352","DOIUrl":"https://doi.org/10.1145/2901318.2901352","url":null,"abstract":"Layer-7 load balancing is a foundational building block of online services. The lack of offerings from major public cloud providers have left online services to build their own load balancers (LB), or use third-party LB design such as HAProxy. The key problem with such proxy-based design is each proxy instance is a single point of failure, as upon its failure, the TCP flow state for the connections with the client and server is lost which breaks the user flows. This significantly affects user experience and online services revenue. In this paper, we present Yoda, a highly available, scalable and low-latency L7-LB-as-a-service in a public cloud. Yoda is based on two design principles we propose for achieving high availability of a L7 LB: decoupling the flow state from the LB instances and storing it in a persistent storage, and leveraging the L4 LB service to enable each L7 LB instance to use the virtual IP in interacting with both the client and the server (called front-and-back indirection). Our evaluation of Yoda prototype on a 60-VM testbed in Windows Azure shows the overhead of decoupling TCP state into a persistent storage is very low (<1 msec), and Yoda maintains all flows during LB instance failures, addition, removal, as well as user policy updates. Our simulation driven by a one-day trace from production online services show that compared to using Yoda by each tenant, Yoda-as-a-service reduces L7 LB instance cost for the tenants by 3.7x while providing 4x more redundancy.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88148415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dmitrii Kuvaiskii, Rasha Faqeh, Pramod Bhatotia, P. Felber, C. Fetzer
{"title":"HAFT: hardware-assisted fault tolerance","authors":"Dmitrii Kuvaiskii, Rasha Faqeh, Pramod Bhatotia, P. Felber, C. Fetzer","doi":"10.1145/2901318.2901339","DOIUrl":"https://doi.org/10.1145/2901318.2901339","url":null,"abstract":"Transient hardware faults during the execution of a program can cause data corruptions. We present HAFT, a fault tolerance technique using hardware extensions of commodity CPUs to protect unmodified multithreaded applications against such corruptions. HAFT utilizes instruction-level redundancy for fault detection and hardware transactional memory for fault recovery. We evaluated HAFT with Phoenix and PARSEC benchmarks. The observed normalized runtime is 2x, with 98.9% of the injected data corruptions being detected and 91.2% being corrected. To demonstrate the effectiveness of HAFT, we applied it to real-world case studies including Memcached, Apache, and SQLite.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79435542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vaggelis Atlidakis, Jeremy Andrus, Roxana Geambasu, Dimitris Mitropoulos, Jason Nieh
{"title":"POSIX abstractions in modern operating systems: the old, the new, and the missing","authors":"Vaggelis Atlidakis, Jeremy Andrus, Roxana Geambasu, Dimitris Mitropoulos, Jason Nieh","doi":"10.1145/2901318.2901350","DOIUrl":"https://doi.org/10.1145/2901318.2901350","url":null,"abstract":"The POSIX standard, developed 25 years ago, comprises a set of operating system (OS) abstractions that aid application portability across UNIX-based OSes. While OSes and applications have evolved tremendously over the last 25 years, POSIX, and the basic set of abstractions it provides, has remained largely unchanged. Little has been done to measure how and to what extent traditional POSIX abstractions are being used in modern OSes, and whether new abstractions are taking form, dethroning traditional ones. We explore these questions through a study of POSIX usage in modern desktop and mobile OSes: Android, OS X, and Ubuntu. Our results show that new abstractions are taking form, replacing several prominent traditional abstractions in POSIX. While the changes are driven by common needs and are conceptually similar across the three OSes, they are not converging on any new standard, increasing fragmentation.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89900189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yilong Geng, V. Jeyakumar, A. Kabbani, Mohammad Alizadeh
{"title":"Juggler: a practical reordering resilient network stack for datacenters","authors":"Yilong Geng, V. Jeyakumar, A. Kabbani, Mohammad Alizadeh","doi":"10.1145/2901318.2901334","DOIUrl":"https://doi.org/10.1145/2901318.2901334","url":null,"abstract":"We present Juggler, a practical reordering resilient network stack for datacenters that enables any packet to be sent on any path at any level of priority. Juggler adds functionality to the Generic Receive Offload layer at the entry of the network stack to put packets in order in a best-effort fashion. Juggler's design exploits the small packet delays in datacenter networks and the inherent burstiness of traffic to eliminate the negative effects of packet reordering almost entirely while keeping state for only a small number of flows at any given time. Extensive testbed experiments at 10Gb/s and 40Gb/s speeds show that Juggler is effective and lightweight: it prevents performance loss even with severe packet reordering while imposing low CPU overhead. We demonstrate the use of Juggler for per-packet multi-path load balancing and a novel system that provides bandwidth guarantees by dynamically prioritizing packets.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79305435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Klimovic, C. Kozyrakis, Eno Thereska, Binu John, Sanjeev Kumar
{"title":"Flash storage disaggregation","authors":"Ana Klimovic, C. Kozyrakis, Eno Thereska, Binu John, Sanjeev Kumar","doi":"10.1145/2901318.2901337","DOIUrl":"https://doi.org/10.1145/2901318.2901337","url":null,"abstract":"PCIe-based Flash is commonly deployed to provide datacenter applications with high IO rates. However, its capacity and bandwidth are often underutilized as it is difficult to design servers with the right balance of CPU, memory and Flash resources over time and for multiple applications. This work examines Flash disaggregation as a way to deal with Flash overprovisioning. We tune remote access to Flash over commodity networks and analyze its impact on workloads sampled from real datacenter applications. We show that, while remote Flash access introduces a 20% throughput drop at the application level, disaggregation allows us to make up for these overheads through resource-efficient scale-out. Hence, we show that Flash disaggregation allows scaling CPU and Flash resources independently in a cost effective manner. We use our analysis to draw conclusions about data and control plane issues in remote storage.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78089422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, E. Xing
{"title":"STRADS: a distributed framework for scheduled model parallel machine learning","authors":"Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, E. Xing","doi":"10.1145/2901318.2901331","DOIUrl":"https://doi.org/10.1145/2901318.2901331","url":null,"abstract":"Machine learning (ML) algorithms are commonly applied to big data, using distributed systems that partition the data across machines and allow each machine to read and update all ML model parameters --- a strategy known as data parallelism. An alternative and complimentary strategy, model parallelism, partitions the model parameters for non-shared parallel access and updates, and may periodically repartition the parameters to facilitate communication. Model parallelism is motivated by two challenges that data-parallelism does not usually address: (1) parameters may be dependent, thus naive concurrent updates can introduce errors that slow convergence or even cause algorithm failure; (2) model parameters converge at different rates, thus a small subset of parameters can bottleneck ML algorithm completion. We propose scheduled model parallelism (SchMP), a programming approach that improves ML algorithm convergence speed by efficiently scheduling parameter updates, taking into account parameter dependencies and uneven convergence. To support SchMP at scale, we develop a distributed framework STRADS which optimizes the throughput of SchMP programs, and benchmark four common ML applications written as SchMP programs: LDA topic modeling, matrix factorization, sparse least-squares (Lasso) regression and sparse logistic regression. By improving ML progress per iteration through SchMP programming whilst improving iteration throughput through STRADS we show that SchMP programs running on STRADS outperform non-model-parallel ML implementations: for example, SchMP LDA and SchMP Lasso respectively achieve 10x and 5x faster convergence than recent, well-established baselines.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72985651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NChecker: saving mobile app developers from network disruptions","authors":"Xinxin Jin, Peng Huang, Tianyin Xu, Yuanyuan Zhou","doi":"10.1145/2901318.2901353","DOIUrl":"https://doi.org/10.1145/2901318.2901353","url":null,"abstract":"Most of today's mobile apps rely on the underlying networks to deliver key functions such as web browsing, file synchronization, and social networking. Compared to desktop-based networks, mobile networks are much more dynamic with frequent connectivity disruptions, network type switches, and quality changes, posing unique programming challenges for mobile app developers. As revealed in this paper, many mobile app developers fail to handle these intermittent network conditions in the mobile network programming. Consequently, network programming defects (NPDs) are pervasive in mobile apps, causing bad user experiences such as crashes, data loss, etc. Despite the development of network libraries in the hope of lifting the developers' burden, we observe that many app developers fail to use these libraries properly and still introduce NPDs. In this paper, we study the characteristics of the real-world NPDs in Android apps towards a deep understanding of their impacts, root causes, and code patterns. Driven by the study, we build NChecker, a practical tool to detect NPDs by statically analyzing Android app binaries. NChecker has been applied to hundreds of real Android apps and detected 4180 NPDs from 285 randomly-selected apps with a 94+% accuracy. Our further analysis of these defects reveals the common mistakes of app developers in working with the existing network libraries' abstractions, which provide insights for improving the usability of mobile network libraries.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73731924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew Newell, G. Kliot, Ishai Menache, Aditya Gopalan, Soramichi Akiyama, M. Silberstein
{"title":"Optimizing distributed actor systems for dynamic interactive services","authors":"Andrew Newell, G. Kliot, Ishai Menache, Aditya Gopalan, Soramichi Akiyama, M. Silberstein","doi":"10.1145/2901318.2901343","DOIUrl":"https://doi.org/10.1145/2901318.2901343","url":null,"abstract":"Distributed actor systems are widely used for developing interactive scalable cloud services, such as social networks and on-line games. By modeling an application as a dynamic set of lightweight communicating \"actors\", developers can easily build complex distributed applications, while the underlying runtime system deals with low-level complexities of a distributed environment. We present ActOp---a data-driven, application-independent runtime mechanism for optimizing end-to-end service latency of actor-based distributed applications. ActOp targets the two dominant factors affecting latency: the overhead of remote inter-actor communications across servers, and the intra-server queuing delay. ActOp automatically identifies frequently communicating actors and migrates them to the same server transparently to the running application. The migration decisions are driven by a novel scalable distributed graph partitioning algorithm which does not rely on a single server to store the whole communication graph, thereby enabling efficient actor placement even for applications with rapidly changing graphs (e.g., chat services). Further, each server autonomously reduces the queuing delay by learning an internal queuing model and configuring threads according to instantaneous request rate and application demands. We prototype ActOp by integrating it with Orleans -- a popular open-source actor system [4, 13]. Experiments with realistic workloads show latency improvements of up to 75% for the 99th percentile, up to 63% for the mean, with up to 2x increase in peak system throughput.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74218158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nathaniel Herman, J. Inala, Yihe Huang, Lillian Tsai, E. Kohler, B. Liskov, L. Shrira
{"title":"Type-aware transactions for faster concurrent code","authors":"Nathaniel Herman, J. Inala, Yihe Huang, Lillian Tsai, E. Kohler, B. Liskov, L. Shrira","doi":"10.1145/2901318.2901348","DOIUrl":"https://doi.org/10.1145/2901318.2901348","url":null,"abstract":"It is often possible to improve a concurrent system's performance by leveraging the semantics of its datatypes. We build a new software transactional memory (STM) around this observation. A conventional STM tracks read- and write-sets of memory words; even simple operations can generate large sets. Our STM, which we call STO, tracks abstract operations on transactional datatypes instead. Parts of the transactional commit protocol are delegated to these datatypes' implementations, which can use datatype semantics, and new commit protocol features, to reduce bookkeeping, limit false conflicts, and implement efficient concurrency control. We test these ideas on the STAMP benchmark suite for STM applications and on our own prior work, the Silo high-performance in-memory database, observing large performance improvements in both systems.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86259525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}