Mark Aiken, Manuel Fähndrich, C. Hawblitzel, G. Hunt, J. Larus
{"title":"Deconstructing process isolation","authors":"Mark Aiken, Manuel Fähndrich, C. Hawblitzel, G. Hunt, J. Larus","doi":"10.1145/1178597.1178599","DOIUrl":"https://doi.org/10.1145/1178597.1178599","url":null,"abstract":"Most operating systems enforce process isolation through hardware protection mechanisms such as memory segmentation, page mapping, and differentiated user and kernel instructions. Singularity is a new operating system that uses software mechanisms to enforce process isolation. A software isolated process (SIP) is a process whose boundaries are established by language safety rules and enforced by static type checking. SIPs provide a low cost isolation mechanism that provides failure isolation and fast inter-process communication.To compare the performance of Singularity's SIPs against traditional isolation techniques, we implemented an optional hardware isolation mechanism. Protection domains are hardware-enforced address spaces, which can contain one or more SIPs. Domains can either run at the kernel's privilege level or be fully isolated from the kernel and run at the normal application privilege level. With protection domains, we can construct Singularity configurations that are similar to micro-kernel and monolithic kernel systems. We found that hardware-based isolation incurs non-trivial performance costs (up to 25--33%) and complicates system implementation. Software isolation has less than 5% overhead on these benchmarks.The lower run-time cost of SIPs makes their use feasible at a finer granularity than conventional processes. However, hardware isolation remains valuable as a defense-in-depth against potential failures in software isolation mechanisms. Singularity's ability to employ hardware isolation selectively enables careful balancing of the costs and benefits of each isolation technique.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121961217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Buehrer, Yen-kuang Chen, S. Parthasarathy, A. Nguyen, A. Ghoting, Daehyun Kim
{"title":"Efficient pattern mining on shared memory systems: implications for chip multiprocessor architectures","authors":"G. Buehrer, Yen-kuang Chen, S. Parthasarathy, A. Nguyen, A. Ghoting, Daehyun Kim","doi":"10.1145/1178597.1178603","DOIUrl":"https://doi.org/10.1145/1178597.1178603","url":null,"abstract":"Frequent pattern mining is a fundamental data mining process which has practical applications ranging from market basket data analysis to web link analysis. In this work, we show that state-of-the-art frequent pattern mining applications are inefficient when executing on a shared memory multiprocessor system, due primarily to poor utilization of the memory hierarchy. To improve the efficiency of these applications, we explore memory performance improvements, task partitioning strategies, and task queuing models designed to maximize the scalability of pattern mining on SMP systems. Empirically, we show that the proposed strategies afford significantly improved performance. We also discuss implications of this work in light of recent trends in micro-architecture design, particularly chip multiprocessors (CMPs).","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128581711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smarter garbage collection with simplifiers","authors":"Melissa E. O'Neill","doi":"10.1145/1178597.1178601","DOIUrl":"https://doi.org/10.1145/1178597.1178601","url":null,"abstract":"We introduce a method for providing lightweight daemons, called simplifiers, that attach themselves to program data. If a data item has a simplifier, the simplifier may be run automatically from time to time, seeking an opportunity to \"simplify\" the object in some way that improves the program's time or space performance.It is not uncommon for programs to improve their data structures as they traverse them, but these improvements must wait until such a traversal occurs. Simplifiers provide an alternative mechanism for making improvements that is not tied to the vagaries of normal control flow.Tracing garbage collectors can both support the simplifier abstraction and benefit from it. Because tracing collectors traverse program data structures, they can trigger simplifiers as part of the tracing process. (In fact, it is possible to view simplifiers as analogous to finalizers; whereas an object can have a finalizer that is run automatically when the object found to be dead, a simplifier can be run when the object is found to be live.)Simplifiers can aid efficient collection by simplifying objects before they are traced, thereby eliminating some data that would otherwise have been traced and saved by the collector. We present performance data to show that appropriately chosen simplifiers can lead to tangible space and speed benefits in practice.Different variations of simplifiers are possible, depending on the triggering mechanism and the synchronization policy. Some kinds of simplifier are already in use in mainstream systems in the form of ad-hoc garbage-collector extensions. For one kind of simplifier we include a complete and portable Java implementation that is less than thirty lines long.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126401225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms","authors":"Michael D. Adams, David S. Wise","doi":"10.1145/1178597.1178604","DOIUrl":"https://doi.org/10.1145/1178597.1178604","url":null,"abstract":"A blossoming paradigm for block-recursive matrix algorithms is presented that, at once, attains excellent performance measured by• time• TLB misses• L1 misses• L2 misses• paging to disk• scaling on distributed processors, and• portability to multiple platforms.It provides a philosophy and tools that allow the programmer to deal with the memory hierarchy invisibly, from L1 and L2 to TLB, paging, and interprocessor communication. Used together, they provide a cache-oblivious style of programming.Plots are presented to support these claims on an implementation of Cholesky factorization crafted directly from the paradigm in C with a few intrinsic calls. The results in this paper focus on low-level performance, including the new Morton-hybrid representation to take advantage of hardware and compiler optimizations. In particular, this code beats Intel's Matrix Kernel Library and matches AMD's Core Math Library, losing a bit on L1 misses while winning decisively on TLB-misses.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122772306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A flexible data to L2 cache mapping approach for future multicore processors","authors":"Lei Jin, Hyunjin Lee, Sangyeun Cho","doi":"10.1145/1178597.1178613","DOIUrl":"https://doi.org/10.1145/1178597.1178613","url":null,"abstract":"This paper proposes and studies a distributed L2 cache management approach through page-level data to cache slice mapping in a future processor chip comprising many cores. L2 cache management is a crucial multicore processor design aspect to overcome non-uniform cache access latency for high program performance and to reduce on-chip network traffic and related power consumption. Unlike previously studied \"pure\" hardware-based private and shared cache designs, the proposed OS-microarchitecture approach allows mimicking a wide spectrum of L2 caching policies without complex hardware support. Moreover, processors and cache slices can be isolated from each other without hardware modifications, resulting in improved chip reliability characteristics. We discuss the key design issues and implementation strategies of the proposed approach, and present an experimental result showing the promise of it.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129027127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}