Operating Systems Review (ACM)最新文献

筛选
英文 中文
Using Local Cache Coherence for Disaggregated Memory Systems 在分解存储系统中使用本地缓存一致性
Operating Systems Review (ACM) Pub Date : 2023-06-26 DOI: 10.1145/3606557.3606561
I. Calciu, M. Imran, Ivan Puddu, Sanidhya Kashyap, H. Maruf, O. Mutlu, Aasheesh Kolli
{"title":"Using Local Cache Coherence for Disaggregated Memory Systems","authors":"I. Calciu, M. Imran, Ivan Puddu, Sanidhya Kashyap, H. Maruf, O. Mutlu, Aasheesh Kolli","doi":"10.1145/3606557.3606561","DOIUrl":"https://doi.org/10.1145/3606557.3606561","url":null,"abstract":"Disaggregated memory provides many cost savings and resource provisioning benefits for current datacenters, but software systems enabling disaggregated memory access result in high performance penalties. These systems require intrusive code changes to port applications for disaggregated memory or employ slow virtual memory mechanisms to avoid code changes. Such mechanisms result in high overhead page faults to access remote data and high dirty data amplification when tracking changes to cached data at page-granularity. In this paper, we propose a fundamentally new approach for disaggregated memory systems, based on the observation that we can use local cache coherence to track applications' memory accesses transparently, without code changes, at cache-line granularity. This simple idea (1) eliminates page faults from the application critical path when accessing remote data, and (2) decouples the application memory access tracking from the virtual memory page size, enabling cache-line granularity dirty data tracking and eviction. Using this observation, we implemented a new software runtime for disaggregated memory that improves average memory access time and reduces dirty data amplification1.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"57 1","pages":"21 - 28"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45832972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Make It Real: An End-to-End Implementation of A Physically Disaggregated Data Center 实现:物理分解数据中心的端到端实现
Operating Systems Review (ACM) Pub Date : 2023-06-26 DOI: 10.1145/3606557.3606559
Yiying Zhang
{"title":"Make It Real: An End-to-End Implementation of A Physically Disaggregated Data Center","authors":"Yiying Zhang","doi":"10.1145/3606557.3606559","DOIUrl":"https://doi.org/10.1145/3606557.3606559","url":null,"abstract":"Resource disaggregation is an approach to separate different hardware resources into independent pools in a data center, so that these pools can be easily managed and their resources can be allocated in a tight but unbounded way. The past decade has seen research and practices in realizing the resource-disaggregation idea on regular servers. We advocate for a physically disaggregated data center, where disaggregated resource pools consist of hardware devices, not servers. Physical disaggregation could unlock another level of benefits in resource disaggregation, including further improved cost saving, easier maintenance and scaling, and more customization. This paper presents our efforts in building an end-to-end physically disaggregated data center, including the design and implementation of disaggregated hardware devices, networking systems for connecting these devices, operating systems for orchestrating them, and porting of traditional and cloud-computing applications to this physically disaggregated platform.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"57 1","pages":"1 - 9"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45983312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory disaggregation: why now and what are the challenges 记忆分解:为什么是现在?挑战是什么
Operating Systems Review (ACM) Pub Date : 2023-06-26 DOI: 10.1145/3606557.3606563
M. Aguilera, Emmanuel Amaro, Nadav Amit, Erika Hunhoff, Anil Yelam, Gerd Zellweger
{"title":"Memory disaggregation: why now and what are the challenges","authors":"M. Aguilera, Emmanuel Amaro, Nadav Amit, Erika Hunhoff, Anil Yelam, Gerd Zellweger","doi":"10.1145/3606557.3606563","DOIUrl":"https://doi.org/10.1145/3606557.3606563","url":null,"abstract":"Hardware disaggregation has emerged as one of the most fundamental shifts in how we build computer systems over the past decades. While disaggregation has been successful for several types of resources (storage, power, and others), memory disaggregation has yet to happen. We make the case that the time for memory disaggregation has arrived. We look at past successful disaggregation stories and learn that their success depended on two requirements: addressing a burning issue and being technically feasible. We examine memory disaggregation through this lens and find that both requirements are finally met. Once available, memory disaggregation will require software support to be used effectively. We discuss some of the challenges of designing an operating system that can utilize disaggregated memory for itself and its applications.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"57 1","pages":"38 - 46"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47625536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Navigating Performance-Efficiency Tradeoffs in Serverless Computing: Deduplication to the Rescue! 在无服务器计算中进行性能效率权衡:重复数据消除助一臂之力!
Operating Systems Review (ACM) Pub Date : 2023-06-26 DOI: 10.1145/3606557.3606564
Divyanshu Saxena, T. Ji, Arjun Singhvi, Junaid Khalid, Aditya Akella
{"title":"Navigating Performance-Efficiency Tradeoffs in Serverless Computing: Deduplication to the Rescue!","authors":"Divyanshu Saxena, T. Ji, Arjun Singhvi, Junaid Khalid, Aditya Akella","doi":"10.1145/3606557.3606564","DOIUrl":"https://doi.org/10.1145/3606557.3606564","url":null,"abstract":"Navigating the performance and efficiency trade-offs is critical for serverless platforms, where the providers ideally want to give the illusion of warm function startups while maintaining low resource costs. Limited controls, provided via toggling sandboxes between warm and cold states and keepalives, force operators to sacrifice significant resources to achieve good performance.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"57 1","pages":"47 - 53"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43927084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disaggregated GPU Acceleration for Serverless Applications 用于无服务器应用程序的分解GPU加速
Operating Systems Review (ACM) Pub Date : 2023-06-26 DOI: 10.1145/3606557.3606560
Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, E. Witchel, C. Rossbach
{"title":"Disaggregated GPU Acceleration for Serverless Applications","authors":"Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, E. Witchel, C. Rossbach","doi":"10.1145/3606557.3606560","DOIUrl":"https://doi.org/10.1145/3606557.3606560","url":null,"abstract":"Serverless platforms have been attracting applications from traditional platforms because infrastructure management responsibilities are shifted from users to providers. Many applications well-suited to serverless environments could leverage GPU acceleration to enhance their performance. Unfortunately, current serverless platforms do not expose GPUs to serverless applications.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"57 1","pages":"10 - 20"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43381776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory Disaggregation: Advances and Open Challenges 记忆分解:进展与开放的挑战
Operating Systems Review (ACM) Pub Date : 2023-05-06 DOI: 10.1145/3606557.3606562
H. Maruf, Mosharaf Chowdhury
{"title":"Memory Disaggregation: Advances and Open Challenges","authors":"H. Maruf, Mosharaf Chowdhury","doi":"10.1145/3606557.3606562","DOIUrl":"https://doi.org/10.1145/3606557.3606562","url":null,"abstract":"Compute and memory are tightly coupled within each server in traditional datacenters. Large-scale datacenter operators have identified this coupling as a root cause behind fleetwide resource underutilization and increasing Total Cost of Ownership (TCO). With the advent of ultra-fast networks and cache-coherent interfaces, memory disaggregation has emerged as a potential solution, whereby applications can leverage available memory even outside server boundaries. This paper summarizes the growing research landscape of memory disaggregation from a software perspective and introduces the challenges toward making it practical under current and future hardware trends. We also reflect on our seven-year journey in the SymbioticLab to build a comprehensive disaggregated memory system over ultra-fast networks. We conclude with some open challenges toward building next-generation memory disaggregation systems leveraging emerging cache-coherent interconnects.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"57 1","pages":"29 - 37"},"PeriodicalIF":0.0,"publicationDate":"2023-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46064963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Positional Paper 定位纸
Operating Systems Review (ACM) Pub Date : 2022-06-14 DOI: 10.1145/3544497.3544500
Y. Shkuro, B. Renard, Ashutosh Kumar Singh
{"title":"Positional Paper","authors":"Y. Shkuro, B. Renard, Ashutosh Kumar Singh","doi":"10.1145/3544497.3544500","DOIUrl":"https://doi.org/10.1145/3544497.3544500","url":null,"abstract":"Application telemetry refers to measurements taken from software systems to assess their performance, availability, correctness, efficiency, and other aspects useful to operators, as well as to troubleshoot them when they behave abnormally. Many modern observability platforms support dimensional models of telemetry signals where the measurements are accompanied by additional dimensions used to identify either the resources described by the telemetry or the business-specific attributes of the activities (e.g., a customer identifier). However, most of these platforms lack any semantic understanding of the data, by not capturing any metadata about telemetry, from simple aspects such as units of measure or data types (treating all dimensions as strings) to more complex concepts such as purpose policies. This limits the ability of the platforms to provide a rich user experience, especially when dealing with different telemetry assets, for example, linking an anomaly in a time series with the corresponding subset of logs or traces, which requires semantic understanding of the dimensions in the respective data sets.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"56 1","pages":"8 - 17"},"PeriodicalIF":0.0,"publicationDate":"2022-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46420845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-Aware Compression for HPC using Machine Learning 使用机器学习的HPC数据感知压缩
Operating Systems Review (ACM) Pub Date : 2022-06-14 DOI: 10.1145/3544497.3544508
Julius Plehn, A. Fuchs, Michael Kuhn, Jakob Lüttgau, T. Ludwig
{"title":"Data-Aware Compression for HPC using Machine Learning","authors":"Julius Plehn, A. Fuchs, Michael Kuhn, Jakob Lüttgau, T. Ludwig","doi":"10.1145/3544497.3544508","DOIUrl":"https://doi.org/10.1145/3544497.3544508","url":null,"abstract":"While compression can provide significant storage and cost savings, its use within HPC applications is often only of secondary concern. This is in part due to the inflexibility of existing approaches where a single compression algorithm has to be used throughout the whole application but also because insights into the behaviour of the algorithms within the context of individual applications are missing.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"56 1","pages":"62 - 69"},"PeriodicalIF":0.0,"publicationDate":"2022-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46111446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and Workload Characterization of the CERN EOS Storage System CERN EOS存储系统的分析与工作负载表征
Operating Systems Review (ACM) Pub Date : 2022-06-14 DOI: 10.1145/3544497.3544507
Devashish R. Purandare, Daniel Bittman, E. L. Miller
{"title":"Analysis and Workload Characterization of the CERN EOS Storage System","authors":"Devashish R. Purandare, Daniel Bittman, E. L. Miller","doi":"10.1145/3544497.3544507","DOIUrl":"https://doi.org/10.1145/3544497.3544507","url":null,"abstract":"Modern, large-scale scientific computing runs on complex exascale storage systems that support even more complex data workloads. Understanding the data access and movement patterns is vital for informing the design of future iterations of existing systems and next-generation systems. Yet we are lacking in publicly available traces and tools to help us understand even one system in depth, let alone correlate long-term cross-system trends.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"56 1","pages":"55 - 61"},"PeriodicalIF":0.0,"publicationDate":"2022-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41771130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Intelligent Framework for Timely, Accurate, and Comprehensive Cloud Incident Detection 用于及时、准确和全面的云事件检测的智能框架
Operating Systems Review (ACM) Pub Date : 2022-06-14 DOI: 10.1145/3544497.3544499
Yichen Li, Xu Zhang, Shilin He, Zhuangbin Chen, Yu Kang, Jinyang Liu, Liqun Li, Yingnong Dang, Feng Gao, Zhangwei Xu, S. Rajmohan, Qingwei Lin, Dongmei Zhang, Michael R. Lyu
{"title":"An Intelligent Framework for Timely, Accurate, and Comprehensive Cloud Incident Detection","authors":"Yichen Li, Xu Zhang, Shilin He, Zhuangbin Chen, Yu Kang, Jinyang Liu, Liqun Li, Yingnong Dang, Feng Gao, Zhangwei Xu, S. Rajmohan, Qingwei Lin, Dongmei Zhang, Michael R. Lyu","doi":"10.1145/3544497.3544499","DOIUrl":"https://doi.org/10.1145/3544497.3544499","url":null,"abstract":"Cloud incidents (service interruptions or performance degradation) dramatically degrade the reliability of large-scale cloud systems, causing customer dissatisfaction and revenue loss. With years of efforts, cloud providers are able to solve most incidents automatically and rapidly. The secret of this ability is intelligent incident detection. Only when incidents are detected timely, accurately, and comprehensively, can they be diagnosed and mitigated at a satisfiable speed. To overcome the limitations of traditional rule-based detection, we carried out years of incident detection research. We developed a comprehensive AIOps (Artificial Intelligence for IT Operations) framework for incident detection containing a set of data-driven methods. This paper shares our recent experience of developing and deploying such an intelligent incident detection system at Microsoft. We first discuss the real-world challenges of incident detection that constitute the pain points of engineers. Then, we summarize our intelligent solutions proposed in recent years to tackle these challenges. Finally, we show the deployment of the incident detection AIOps framework and demonstrate its practical benefits conveyed to Microsoft cloud services with real cases.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"56 1","pages":"1 - 7"},"PeriodicalIF":0.0,"publicationDate":"2022-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45597721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信