Proceedings of the 27th ACM Symposium on Operating Systems Principles最新文献

Notary: a device for secure transaction approval 公证:一种安全的交易批准装置

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359661

Anish Athalye, A. Belay, M. Kaashoek, R. Morris, N. Zeldovich

{"title":"Notary: a device for secure transaction approval","authors":"Anish Athalye, A. Belay, M. Kaashoek, R. Morris, N. Zeldovich","doi":"10.1145/3341301.3359661","DOIUrl":"https://doi.org/10.1145/3341301.3359661","url":null,"abstract":"Notary is a new hardware and software architecture for running isolated approval agents in the form factor of a USB stick with a small display and buttons. Approval agents allow factoring out critical security decisions, such as getting the user's approval to sign a Bitcoin transaction or to delete a backup, to a secure environment. The key challenge addressed by Notary is to securely switch between agents on the same device. Prior systems either avoid the problem by building single-function devices like a USB U2F key, or they provide weak isolation that is susceptible to kernel bugs, side channels, or Rowhammer-like attacks. Notary achieves strong isolation using reset-based switching, along with the use of physically separate systems-on-a-chip for agent code and for the kernel, and a machine-checked proof of both the hardware's register-transfer-level design and software, showing that reset-based switching leaks no state. Notary also provides a trustworthy I/O path between the agent code and the user, which prevents an adversary from tampering with the user's screen or buttons. We built a hardware/software prototype of Notary, using a combination of ARM and RISC-V processors. The prototype demonstrates that it is feasible to verify Notary's reset-based switching, and that Notary can support diverse agents, including cryptocurrencies and a transaction approval agent for traditional client-server applications such as websites. Measurements of reset-based switching show that it is fast enough for interactive use. We analyze security bugs in existing cryptocurrency hardware wallets, which aim to provide a similar form factor and feature set as Notary, and show that Notary's design avoids many bugs that affect them.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116062400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

A generic communication scheduler for distributed DNN training acceleration 分布式DNN训练加速的通用通信调度程序

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359642

Yanghua Peng, Yibo Zhu, Yangrui Chen, Y. Bao, Bairen Yi, Chang Lan, Chuan Wu, Chuanxiong Guo

{"title":"A generic communication scheduler for distributed DNN training acceleration","authors":"Yanghua Peng, Yibo Zhu, Yangrui Chen, Y. Bao, Bairen Yi, Chang Lan, Chuan Wu, Chuanxiong Guo","doi":"10.1145/3341301.3359642","DOIUrl":"https://doi.org/10.1145/3341301.3359642","url":null,"abstract":"We present ByteScheduler, a generic communication scheduler for distributed DNN training acceleration. ByteScheduler is based on our principled analysis that partitioning and rearranging the tensor transmissions can result in optimal results in theory and good performance in real-world even with scheduling overhead. To make ByteScheduler work generally for various DNN training frameworks, we introduce a unified abstraction and a Dependency Proxy mechanism to enable communication scheduling without breaking the original dependencies in framework engines. We further introduce a Bayesian Optimization approach to auto-tune tensor partition size and other parameters for different training models under various networking conditions. ByteScheduler now supports TensorFlow, PyTorch, and MXNet without modifying their source code, and works well with both Parameter Server (PS) and all-reduce architectures for gradient synchronization, using either TCP or RDMA. Our experiments show that ByteScheduler accelerates training with all experimented system configurations and DNN models, by up to 196% (or 2.96X of original speed).","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130049922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 246

AutoMine

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359633

Daniel Mawhirter, Bo Wu

{"title":"AutoMine","authors":"Daniel Mawhirter, Bo Wu","doi":"10.1145/3341301.3359633","DOIUrl":"https://doi.org/10.1145/3341301.3359633","url":null,"abstract":"Graph mining algorithms that aim at identifying structural patterns of graphs are typically more complex than graph computation algorithms such as breadth first search. Researchers have implemented several systems with high-level and flexible interfaces customized for tackling graph mining problems. However, we find that for triangle counting, one of the simplest graph mining problems, such systems can be several times slower than a single-threaded implementation of a straightforward algorithm. In this paper, we reveal the root causes of the severe inefficiencies of state-of-the-art graph mining systems and the challenges to address the performance problems. We build AutoMine, a single-machine system to provide both high-level interfaces and high performance for large-scale graph mining applications. The novelty of AutoMine comes from 1) a new representation of subgraph patterns and 2) compilation techniques that automatically generate efficient mining code with minimized memory consumption from a high-level abstraction. We have extensively evaluated AutoMine against 3 graph mining systems on 8 real-world graphs of different scales. Our experimental results show that AutoMine often produces several orders of magnitude better performance and can process very large graphs existing systems cannot handle.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125385242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 65

Honeycrisp: large-scale differentially private aggregation without a trusted core Honeycrisp:没有可信核心的大规模差异私有聚合

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359660

Edo Roth, D. Noble, B. Falk, Andreas Haeberlen

{"title":"Honeycrisp: large-scale differentially private aggregation without a trusted core","authors":"Edo Roth, D. Noble, B. Falk, Andreas Haeberlen","doi":"10.1145/3341301.3359660","DOIUrl":"https://doi.org/10.1145/3341301.3359660","url":null,"abstract":"Recently, a number of systems have been deployed that gather sensitive statistics from user devices while giving differential privacy guarantees. One prominent example is the component in Apple's macOS and iOS devices that collects information about emoji usage and new words. However, these systems have been criticized for making unrealistic assumptions, e.g., by creating a very high \"privacy budget\" for answering queries, and by replenishing this budget every day, which results in a high worst-case privacy loss. However, it is not obvious whether such assumptions can be avoided if one requires a strong threat model and wishes to collect data periodically, instead of just once. In this paper, we show that, essentially, it is possible to have one's cake and eat it too. We describe a system called Honeycrisp whose privacy cost depends on how often the data changes, and not on how often a query is asked. Thus, if the data is relatively stable (as is likely the case, e.g., with emoji and word usage), Honeycrisp can answer periodic queries for many years, as long as the underlying data does not change too often. Honeycrisp accomplishes this by using a) the sparse-vector technique, and b) a combination of cryptographic techniques to enable global differential privacy without a trusted party. Using a prototype implementation, we show that Honeycrisp is efficient and can scale to large deployments.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123881650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 52

Taiji 太地町

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359655

David Chou, T. Xu, K. Veeraraghavan, Andrew J. Newell, Sonia Margulis, Lin Xiao, Pol Mauri Ruiz, Justin Meza, Kiryong Ha, Shruti Padmanabha, Kevin Cole, D. Perelman

{"title":"Taiji","authors":"David Chou, T. Xu, K. Veeraraghavan, Andrew J. Newell, Sonia Margulis, Lin Xiao, Pol Mauri Ruiz, Justin Meza, Kiryong Ha, Shruti Padmanabha, Kevin Cole, D. Perelman","doi":"10.1145/3341301.3359655","DOIUrl":"https://doi.org/10.1145/3341301.3359655","url":null,"abstract":"We present Taiji, a new system for managing user traffic for large-scale Internet services that accomplishes two goals: 1) balancing the utilization of data centers and 2) minimizing network latency of user requests. Taiji models edge-to-datacenter traffic routing as an assignment problem---assigning traffic objects at the edge to the data centers to satisfy service-level objectives. Taiji uses a constraint optimization solver to generate an optimal routing table that specifies the fractions of traffic each edge node will distribute to different data centers. Taiji continuously adjusts the routing table to accommodate the dynamics of user traffic and failure events that reduce capacity. Taiji leverages connections among users to selectively route traffic of highly-connected users to the same data centers based on fractions in the routing table. This routing strategy, which we term connection-aware routing, allows us to reduce query load on our backend storage by 17%. Taiji has been used in production at Facebook for more than four years and routes global traffic in a user-aware manner for several large-scale product services across dozens of edge nodes and data centers.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121239149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Efficient scalable thread-safety-violation detection: finding thousands of concurrency bugs during testing 高效的可伸缩线程安全违规检测:在测试期间发现数千个并发错误

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359638

Guangpu Li, Shan Lu, M. Musuvathi, Suman Nath, Rohan Padhye

{"title":"Efficient scalable thread-safety-violation detection: finding thousands of concurrency bugs during testing","authors":"Guangpu Li, Shan Lu, M. Musuvathi, Suman Nath, Rohan Padhye","doi":"10.1145/3341301.3359638","DOIUrl":"https://doi.org/10.1145/3341301.3359638","url":null,"abstract":"Concurrency bugs are hard to find, reproduce, and debug. They often escape rigorous in-house testing, but result in large-scale outages in production. Existing concurrency-bug detection techniques unfortunately cannot be part of industry's integrated build and test environment due to some open challenges: how to handle code developed by thousands of engineering teams that uses a wide variety of synchronization mechanisms, how to report little/no false positives, and how to avoid excessive testing resource consumption. This paper presents TSVD, a thread-safety violation detector that addresses these challenges through a new design point in the domain of active testing. Unlike previous techniques that inject delays randomly or employ expensive synchronization analysis, TSVD uses lightweight monitoring of the calling behaviors of thread-unsafe methods, not any synchronization operations, to dynamically identify bug suspects. It then injects corresponding delays to drive the program towards thread-unsafe behaviors, actively learns from its ability or inability to do so, and persists its learning from one test run to the next. TSVD is deployed and regularly used in Microsoft and it has already found over 1000 thread-safety violations from thousands of projects. It detects more bugs than state-of-the-art techniques, mostly with just one test run.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121074946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

PipeDream: generalized pipeline parallelism for DNN training PipeDream: DNN训练的广义管道并行

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359646

D. Narayanan, A. Harlap, Amar Phanishayee, V. Seshadri, Nikhil R. Devanur, G. Ganger, Phillip B. Gibbons, M. Zaharia

{"title":"PipeDream: generalized pipeline parallelism for DNN training","authors":"D. Narayanan, A. Harlap, Amar Phanishayee, V. Seshadri, Nikhil R. Devanur, G. Ganger, Phillip B. Gibbons, M. Zaharia","doi":"10.1145/3341301.3359646","DOIUrl":"https://doi.org/10.1145/3341301.3359646","url":null,"abstract":"DNN training is extremely time-consuming, necessitating efficient multi-accelerator parallelization. Current approaches to parallelizing training primarily use intra-batch parallelization, where a single iteration of training is split over the available workers, but suffer from diminishing returns at higher worker counts. We present PipeDream, a system that adds inter-batch pipelining to intra-batch parallelism to further improve parallel training throughput, helping to better overlap computation with communication and reduce the amount of communication when possible. Unlike traditional pipelining, DNN training is bi-directional, where a forward pass through the computation graph is followed by a backward pass that uses state and intermediate data computed during the forward pass. Naïve pipelining can thus result in mismatches in state versions used in the forward and backward passes, or excessive pipeline flushes and lower hardware efficiency. To address these challenges, PipeDream versions model parameters for numerically correct gradient computations, and schedules forward and backward passes of different minibatches concurrently on different workers with minimal pipeline stalls. PipeDream also automatically partitions DNN layers among workers to balance work and minimize communication. Extensive experimentation with a range of DNN tasks, models, and hardware configurations shows that PipeDream trains models to high accuracy up to 5.3X faster than commonly used intra-batch parallelism techniques.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126780076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 520

KnightKing

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359634

Ke Yang, Mingxing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, Yong Jiang

{"title":"KnightKing","authors":"Ke Yang, Mingxing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, Yong Jiang","doi":"10.1145/3341301.3359634","DOIUrl":"https://doi.org/10.1145/3341301.3359634","url":null,"abstract":"Random walk on graphs has recently gained immense popularity as a tool for graph data analytics and machine learning. Currently, random walk algorithms are developed as individual implementations and suffer significant performance and scalability problems, especially with the dynamic nature of sophisticated walk strategies. We present KnightKing, the first general-purpose, distributed graph random walk engine. To address the unique interaction between a static graph and many dynamic walkers, it adopts an intuitive walker-centric computation model. The corresponding programming model allows users to easily specify existing or new random walk algorithms, facilitated by a new unified edge transition probability definition that applies across popular known algorithms. With KnightKing, these diverse algorithms benefit from its common distributed random walk execution engine, centered around an innovative rejection-based sampling mechanism that dramatically reduces the cost of higher-order random walk algorithms. Our evaluation confirms that KnightKing brings up to 4 orders of magnitude improvement in executing algorithms that currently can only be afforded with approximation solutions on large graphs.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132812354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Aegean: replication beyond the client-server model 爱琴海:超越客户端-服务器模型的复制

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359663

Remzi Can Aksoy, Manos Kapritsos

引用次数: 12

TASO

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-10-27 DOI: 10.1145/3341301.3359630

Zhihao Jia, O. Padon, James P. Thomas, Todd Warszawski, M. Zaharia, A. Aiken

{"title":"TASO","authors":"Zhihao Jia, O. Padon, James P. Thomas, Todd Warszawski, M. Zaharia, A. Aiken","doi":"10.1145/3341301.3359630","DOIUrl":"https://doi.org/10.1145/3341301.3359630","url":null,"abstract":"Existing deep neural network (DNN) frameworks optimize the computation graph of a DNN by applying graph transformations manually designed by human experts. This approach misses possible graph optimizations and is difficult to scale, as new DNN operators are introduced on a regular basis. We propose TASO, the first DNN computation graph optimizer that automatically generates graph substitutions. TASO takes as input a list of operator specifications and generates candidate substitutions using the given operators as basic building blocks. All generated substitutions are formally verified against the operator specifications using an automated theorem prover. To optimize a given DNN computation graph, TASO performs a cost-based backtracking search, applying the substitutions to find an optimized graph, which can be directly used by existing DNN frameworks. Our evaluation on five real-world DNN architectures shows that TASO outperforms existing DNN frameworks by up to 2.8X, while requiring significantly less human effort. For example, TensorFlow currently contains approximately 53,000 lines of manual optimization rules, while the operator specifications needed by TASO are only 1,400 lines of code.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114354119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 195