Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems最新文献

筛选
英文 中文
Potluck: Cross-Application Approximate Deduplication for Computation-Intensive Mobile Applications Potluck:计算密集型移动应用的跨应用近似重复数据删除
Peizhen Guo, Wenjun Hu
{"title":"Potluck: Cross-Application Approximate Deduplication for Computation-Intensive Mobile Applications","authors":"Peizhen Guo, Wenjun Hu","doi":"10.1145/3173162.3173185","DOIUrl":"https://doi.org/10.1145/3173162.3173185","url":null,"abstract":"Emerging mobile applications, such as cognitive assistance and augmented reality (AR) based gaming, are increasingly computation-intensive and latency-sensitive, while running on resource-constrained devices. The standard approaches to addressing these involve either offloading to a cloud(let) or local system optimizations to speed up the computation, often trading off computation quality for low latency. Instead, we observe that these applications often operate on similar input data from the camera feed and share common processing components, both within the same (type of) applications and across different ones. Therefore, deduplicating processing across applications could deliver the best of both worlds. In this paper, we present Potluck, to achieve approximate deduplication. At the core of the system is a cache service that stores and shares processing results between applications and a set of algorithms to process the input data to maximize deduplication opportunities. This is implemented as a background service on Android. Extensive evaluation shows that Potluck can reduce the processing latency for our AR and vision workloads by a factor of 2.5 to 10.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128586984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
SmoothOperator: Reducing Power Fragmentation and Improving Power Utilization in Large-scale Datacenters SmoothOperator:减少大规模数据中心的电力碎片和提高电力利用率
Chang-Hong Hsu, Qingyuan Deng, Jason Mars, Lingjia Tang
{"title":"SmoothOperator: Reducing Power Fragmentation and Improving Power Utilization in Large-scale Datacenters","authors":"Chang-Hong Hsu, Qingyuan Deng, Jason Mars, Lingjia Tang","doi":"10.1145/3173162.3173190","DOIUrl":"https://doi.org/10.1145/3173162.3173190","url":null,"abstract":"With the ever growing popularity of cloud computing and web services, Internet companies are in need of increased computing capacity to serve the demand. However, power has become a major limiting factor prohibiting the growth in industry: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. In this work, we first investigate the power utilization in Facebook datacenters. We observe that the combination of provisioning for peak power usage, highly fluctuating traffic, and multi-level power delivery infrastructure leads to significant power budget fragmentation problem and inefficiently low power utilization. To address this issue, our insight is that heterogeneity of power consumption patterns among different services provides opportunities to re-shape the power profile of each power node by re-distributing services. By grouping services with asynchronous peak times under the same power node, we can reduce the peak power of each node and thus creating more power head-rooms to allow more servers hosted, achieving higher throughput. Based on this insight, we develop a workload-aware service placement framework to systematically spread the service instances with synchronous power patterns evenly under the power supply tree, greatly reducing the peak power draw at power nodes. We then leverage dynamic power profile reshaping to maximally utilize the headroom unlocked by our placement framework. Our experiments based on real production workload and power traces show that we are able to host up to 13% more machines in production, without changing the underlying power infrastructure. Utilizing the unleashed power headroom with dynamic reshaping, we achieve up to an estimated total of 15% and 11% throughput improvement for latency-critical service and batch service respectively at the same time, with up to 44% of energy slack reduction.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122724468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
A Reconfigurable Energy Storage Architecture for Energy-harvesting Devices 一种用于能量收集装置的可重构能量存储体系结构
A. Colin, E. Ruppel, Brandon Lucia
{"title":"A Reconfigurable Energy Storage Architecture for Energy-harvesting Devices","authors":"A. Colin, E. Ruppel, Brandon Lucia","doi":"10.1145/3173162.3173210","DOIUrl":"https://doi.org/10.1145/3173162.3173210","url":null,"abstract":"Battery-free, energy-harvesting devices operate using energy collected exclusively from their environment. Energy-harvesting devices allow maintenance-free deployment in extreme environments, but requires a power system to provide the right amount of energy when an application needs it. Existing systems must provision energy capacity statically based on an application's peak demand which compromises efficiency and responsiveness when not at peak demand. This work presents Capybara: a co-designed hardware/software power system with dynamically reconfigurable energy storage capacity that meets varied application energy demand. The Capybara software interface allows programmers to specify the energy mode of an application task. Capybara's runtime system reconfigures Capybara's hardware energy capacity to match application demand. Capybara also allows a programmer to write reactive application tasks that pre-allocate a burst of energy that it can spend in response to an asynchronous (e.g., external) event. We instantiated Capybara's hardware design in two EH devices and implemented three reactive sensing applications using its software interface. Capybara improves event detection accuracy by 2x-4x over statically-provisioned energy capacity, maintains response latency within 1.5x of a continuously-powered baseline, and enables reactive applications that are intractable with existing power systems.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127780432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 167
MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects MAERI:通过可重构互连在DNN加速器上实现灵活的数据流映射
Hyoukjun Kwon, A. Samajdar, T. Krishna
{"title":"MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects","authors":"Hyoukjun Kwon, A. Samajdar, T. Krishna","doi":"10.1145/3173162.3173176","DOIUrl":"https://doi.org/10.1145/3173162.3173176","url":null,"abstract":"Deep neural networks (DNN) have demonstrated highly promising results across computer vision and speech recognition, and are becoming foundational for ubiquitous AI. The computational complexity of these algorithms and a need for high energy-efficiency has led to a surge in research on hardware accelerators. % for this paradigm. To reduce the latency and energy costs of accessing DRAM, most DNN accelerators are spatial in nature, with hundreds of processing elements (PE) operating in parallel and communicating with each other directly. DNNs are evolving at a rapid rate, and it is common to have convolution, recurrent, pooling, and fully-connected layers with varying input and filter sizes in the most recent topologies.They may be dense or sparse. They can also be partitioned in myriad ways (within and across layers) to exploit data reuse (weights and intermediate outputs). All of the above can lead to different dataflow patterns within the accelerator substrate. Unfortunately, most DNN accelerators support only fixed dataflow patterns internally as they perform a careful co-design of the PEs and the network-on-chip (NoC). In fact, the majority of them are only optimized for traffic within a convolutional layer. This makes it challenging to map arbitrary dataflows on the fabric efficiently, and can lead to underutilization of the available compute resources. DNN accelerators need to be programmable to enable mass deployment. For them to be programmable, they need to be configurable internally to support the various dataflow patterns that could be mapped over them. To address this need, we present MAERI, which is a DNN accelerator built with a set of modular and configurable building blocks that can easily support myriad DNN partitions and mappings by appropriately configuring tiny switches. MAERI provides 8-459% better utilization across multiple dataflow mappings over baselines with rigid NoC fabrics.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124092023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 323
Automatic Hierarchical Parallelization of Linear Recurrences 线性递归的自动分层并行化
Sepideh Maleki, Martin Burtscher
{"title":"Automatic Hierarchical Parallelization of Linear Recurrences","authors":"Sepideh Maleki, Martin Burtscher","doi":"10.1145/3173162.3173168","DOIUrl":"https://doi.org/10.1145/3173162.3173168","url":null,"abstract":"Linear recurrences encompass many fundamental computations including prefix sums and digital filters. Later result values depend on earlier result values in recurrences, making it a challenge to compute them in parallel. We present a new work- and space-efficient algorithm to compute linear recurrences that is amenable to automatic parallelization and suitable for hierarchical massively-parallel architectures such as GPUs. We implemented our approach in a domain-specific code generator that emits optimized CUDA code. Our evaluation shows that, for standard prefix sums and single-stage IIR filters, the generated code reaches the throughput of memory copy for large inputs, which cannot be surpassed. On higher-order prefix sums, it performs nearly as well as the fastest handwritten code from the literature. On tuple-based prefix sums and digital filters, our automatically parallelized code outperforms the fastest prior implementations.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"23 2-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124555084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Session details: Session 8B: Potpourri 会议详情:8B:百花香
Yan Solihin
{"title":"Session details: Session 8B: Potpourri","authors":"Yan Solihin","doi":"10.1145/3252967","DOIUrl":"https://doi.org/10.1145/3252967","url":null,"abstract":"","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121875879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FCatch: Automatically Detecting Time-of-fault Bugs in Cloud Systems FCatch:自动检测云系统中的故障时间错误
Haopeng Liu, Xu Wang, Guangpu Li, Shan Lu, Feng Ye, Chen Tian
{"title":"FCatch: Automatically Detecting Time-of-fault Bugs in Cloud Systems","authors":"Haopeng Liu, Xu Wang, Guangpu Li, Shan Lu, Feng Ye, Chen Tian","doi":"10.1145/3173162.3177161","DOIUrl":"https://doi.org/10.1145/3173162.3177161","url":null,"abstract":"It is crucial for distributed systems to achieve high availability. Unfortunately, this is challenging given the common component failures (i.e., faults). Developers often cannot anticipate all the timing conditions and system states under which a fault might occur, and introduce time-of-fault (TOF) bugs that only manifest when a node crashes or a message drops at a special moment. Although challenging, detecting TOF bugs is fundamental to developing highly available distributed systems. Unlike previous work that relies on fault injection to expose TOF bugs, this paper carefully models TOF bugs as a new type of concurrency bugs, and develops FCatch to automatically predict TOF bugs by observing correct execution. Evaluation on representative cloud systems shows that FCatch is effective, accurately finding severe TOF bugs.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126089233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing 内存集群计算的数据感知高维配置自动调优
Zhibin Yu, Zhendong Bei, Xuehai Qian
{"title":"Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing","authors":"Zhibin Yu, Zhendong Bei, Xuehai Qian","doi":"10.1145/3173162.3173187","DOIUrl":"https://doi.org/10.1145/3173162.3173187","url":null,"abstract":"In-Memory cluster Computing (IMC) frameworks (e.g., Spark) have become increasingly important because they typically achieve more than 10× speedups over the traditional On-Disk cluster Computing (ODC) frameworks for iterative and interactive applications. Like ODC, IMC frameworks typically run the same given programs repeatedly on a given cluster with similar input dataset size each time. It is challenging to build performance model for IMC program because: 1) the performance of IMC programs is more sensitive to the size of input dataset, which is known to be difficult to be incorporated into a performance model due to its complex effects on performance; 2) the number of performance-critical configuration parameters in IMC is much larger than ODC (more than 40 vs. around 10), the high dimensionality requires more sophisticated models to achieve high accuracy. To address this challenge, we propose DAC, a datasize-aware auto-tuning approach to efficiently identify the high dimensional configuration for a given IMC program to achieve optimal performance on a given cluster. DAC is a significant advance over the state-of-the-art because it can take the size of input dataset and 41 configuration parameters as the parameters of the performance model for a given IMC program, --- unprecedented in previous work. It is made possible by two key techniques: 1) Hierarchical Modeling (HM), which combines a number of individual sub-models in a hierarchical manner; 2) Genetic Algorithm (GA) is employed to search the optimal configuration. To evaluate DAC, we use six typical Spark programs, each with five different input dataset sizes. The evaluation results show that DAC improves the performance of six typical Spark programs, each with five different input dataset sizes compared to default configurations by a factor of 30.4x on average and up to 89x. We also report that the geometric mean speedups of DAC over configurations by default, expert, and RFHOC are 15.4x, 2.3x, and 1.5x, respectively.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126092941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware FirmUp:固件中常见漏洞的精确静态检测
Yaniv David, Nimrod Partush, Eran Yahav
{"title":"FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware","authors":"Yaniv David, Nimrod Partush, Eran Yahav","doi":"10.1145/3173162.3177157","DOIUrl":"https://doi.org/10.1145/3173162.3177157","url":null,"abstract":"We present a static, precise, and scalable technique for finding CVEs (Common Vulnerabilities and Exposures) in stripped firmware images. Our technique is able to efficiently find vulnerabilities in real-world firmware with high accuracy. Given a vulnerable procedure in an executable binary and a firmware image containing multiple stripped binaries, our goal is to detect possible occurrences of the vulnerable procedure in the firmware image. Due to the variety of architectures and unique tool chains used by vendors, as well as the highly customized nature of firmware, identifying procedures in stripped firmware is extremely challenging. Vulnerability detection requires not only pairwise similarity between procedures but also information about the relationships between procedures in the surrounding executable. This observation serves as the foundation for a novel technique that establishes a partial correspondence between procedures in the two binaries. We implemented our technique in a tool called FirmUp and performed an extensive evaluation over 40 million procedures, over 4 different prevalent architectures, crawled from public vendor firmware images. We discovered 373 vulnerabilities affecting publicly available firmware, 147 of them in the latest available firmware version for the device. A thorough comparison of FirmUp to previous methods shows that it accurately and effectively finds vulnerabilities in firmware, while outperforming the detection rate of the state of the art by 45% on average.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"349 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123942054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Frightening Small Children and Disconcerting Grown-ups: Concurrency in the Linux Kernel 让小孩子害怕,让成年人不安:Linux内核中的并发性
J. Alglave, Luc Maranget, P. McKenney, A. Parri, A. S. Stern
{"title":"Frightening Small Children and Disconcerting Grown-ups: Concurrency in the Linux Kernel","authors":"J. Alglave, Luc Maranget, P. McKenney, A. Parri, A. S. Stern","doi":"10.1145/3173162.3177156","DOIUrl":"https://doi.org/10.1145/3173162.3177156","url":null,"abstract":"Concurrency in the Linux kernel can be a contentious topic. The Linux kernel mailing list features numerous discussions related to consistency models, including those of the more than 30 CPU architectures supported by the kernel and that of the kernel itself. How are Linux programs supposed to behave? Do they behave correctly on exotic hardware? A formal model can help address such questions. Better yet, an executable model allows programmers to experiment with the model to develop their intuition. Thus we offer a model written in the cat language, making it not only formal, but also executable by the herd simulator. We tested our model against hardware and refined it in consultation with maintainers. Finally, we formalised the fundamental law of the Read-Copy-Update synchronisation mechanism, and proved that one of its implementations satisfies this law.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131426373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信