J. B. Chen, Yasuhiro Endo, Kee Chan, David Mazières, Antonio Dias, M. Seltzer, Michael D. Smith
{"title":"The measured performance of personal computer operating systems","authors":"J. B. Chen, Yasuhiro Endo, Kee Chan, David Mazières, Antonio Dias, M. Seltzer, Michael D. Smith","doi":"10.1145/224056.224079","DOIUrl":"https://doi.org/10.1145/224056.224079","url":null,"abstract":"This paper presents a comparative study of the performance of three operating systems that run on the personal computer architecture derived from the IBM-PC. The operating systems, Windows for Workgroups, Windows NT, and NetBSD (a freely available variant of the UNIX operating system), cover a broad range of system functionality and user requirements, from a single address space model to full protection with preemptive multi-tasking. Our measurements were enabled by hardware counters in Intel's Pentium processor that permit measurement of a broad range of processor events including instruction counts and on-chip cache miss counts. We used both microbenchmarks, which expose specific differences between the systems, and application workloads, which provide an indication of expected end-to-end performance. Our microbenchmark results show that accessing system functionality is often more expensive in Windows for Workgroups than in the other two systems due to frequent changes in machine mode and the use of system call hooks. When running native applications, Windows NT is more efficient than Windows, but it incurs overhead similar to that of a microkernel since its application interface (the Win32 API) is implemented as a user-level server. Overall, system functionality can be accessed most efficiently in NetBSD ; we attribute this to its monolithic structure, and to the absence of the complications created by hardware backwards compatibility requirements in the other systems. Measurements of application performance show that although the impact of these differences is significant in terms of instruction counts and other hardware events (often a factor of 2 to 7 difference between the systems), overall performance is sometimes determined by the functionality provided by specific subsystems, such as the graphics subsystem or the file system buffer cache.","PeriodicalId":168455,"journal":{"name":"Proceedings of the fifteenth ACM symposium on Operating systems principles","volume":"29-32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115440387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Wilkes, Richard A. Golding, Carl Staelin, Tim Sullivan
{"title":"The HP AutoRAID hierarchical storage system","authors":"J. Wilkes, Richard A. Golding, Carl Staelin, Tim Sullivan","doi":"10.1145/224056.224065","DOIUrl":"https://doi.org/10.1145/224056.224065","url":null,"abstract":"Con@uring redundant disk arrays is a black art. To configure an array properly, a system administrator must understand the details of both the array and the workload it will support. Incorrect understanding of either, or changes in the workload over time, can lead to poor performance, We present a solution to this problem: a two-level storage hierarchy implemented inside a single disk-array controller. In the upper level of this hierarchy, two copies of active data are stored to provide full redundancy and excellent performance. In the lower level, RAID 5 parity protection is used to provide excellent storage cost for inactive data, at somewhat lower performance. The technology we describe in this article, known as HP AutoRAID, automatically and transparently manages migration of data blocks between these two levels as access patterns change. The result is a fully redundant storage system that is extremely easy to use, is suitable for a wide variety of workloads, is largely insensitive to dynamic workload changes, and performs much better than disk arrays with comparable numbers of spindles and much larger amounts of front-end RAM cache, Because the implementation of the HP AutoRAID technology is almost entirely in software, the additional hardware cost for these benefits is very small. We describe the HP AutoRAID technology in detail, provide performance data for an embodiment of it in a storage array, and summarize the results of simulation studies used to choose algorithms implemented in the array.","PeriodicalId":168455,"journal":{"name":"Proceedings of the fifteenth ACM symposium on Operating systems principles","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127169449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Rosenblum, Edouard Bugnion, S. Herrod, E. Witchel, Anoop Gupta
{"title":"The impact of architectural trends on operating system performance","authors":"M. Rosenblum, Edouard Bugnion, S. Herrod, E. Witchel, Anoop Gupta","doi":"10.1145/224056.224078","DOIUrl":"https://doi.org/10.1145/224056.224078","url":null,"abstract":"Computer systems are rapidly changing. Over the next few years, we will see wide-scale deployment of dynamically-scheduled processors that can issue multiple instructions every clock cycle, execute instructions out of order, and overlap computation and cache misses. We also expect clock-rates to increase, caches to grow, and multiprocessors to replace uniprocessors. Using SimOS, a complete machine simulation environment, this paper explores the impact of the above architectural trends on operating system performance. We present results based on the execution of large and realistic workloads (program development, transaction processing, and engineering compute-server) running on the IRIX 5.3 operating system from Silicon Graphics Inc. Looking at uniprocessor trends, we find that disk I/O is the first-order bottleneck for workloads such as program development and transaction processing. Its importance continues to grow over time. Ignoring I/O, we find that the memory system is the key bottleneck, stalling the CPU for over 50% of the execution time. Surprisingly, however, our results show that this stall fraction is unlikely to increase on future machines due to increased cache sizes and new latency hiding techniques in processors. We also find that the benefits of these architectural trends spread broadly across a majority of the important services provided by the operating system. We find the situation to be much worse for multiprocessors. Most operating systems services consume 30-70% more time than their uniprocessor counterparts. A large fraction of the stalls are due to coherence misses caused by communication between processors. Because larger caches do not reduce coherence misses, the performance gap between uniprocessor and multiprocessor performance will increase unless operating system developers focus on kernel restructuring to reduce unnecessary communication. The paper presents a detailed decomposition of execution time (e.g., instruction execution time, memory stall time separately for instructions and data, synchronization time) for important kernel services in the three workloads.","PeriodicalId":168455,"journal":{"name":"Proceedings of the fifteenth ACM symposium on Operating systems principles","volume":"183 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134191732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. Fiuczynski, D. Becker, C. Chambers, S. Eggers
{"title":"Extensibility safety and performance in the SPIN operating system","authors":"B. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. Fiuczynski, D. Becker, C. Chambers, S. Eggers","doi":"10.1145/224056.224077","DOIUrl":"https://doi.org/10.1145/224056.224077","url":null,"abstract":"This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure, together with a core set of extensible services, that allow applications to safely change the operating system's interface and implementation. Extensions allow an application to specialize the underlying operating system in order to achieve a particular level of performance and functionality. SPIN uses language and link-time mechanisms to inexpensively export fine-grained interfaces to operating system services. Extensions are written in a type safe language, and are dynamically linked into the operating system kernel. This approach offers extensions rapid access to system services, while protecting the operating system code executing within the kernel address space. SPIN and its extensions are written in Modula-3 and run on DEC Alpha workstations.","PeriodicalId":168455,"journal":{"name":"Proceedings of the fifteenth ACM symposium on Operating systems principles","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114223519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"U-Net: a user-level network interface for parallel and distributed computing","authors":"T. V. Eicken, A. Basu, Vineet Buch, W. Vogels","doi":"10.1145/224056.224061","DOIUrl":"https://doi.org/10.1145/224056.224061","url":null,"abstract":"The U-Net communication architecture provides processes with a virtual view of a network interface to enable userlevel access to high-speed communication devices. The architecture, implemented on standard workstations using offthe-shelf ATM communication hardware, removes the kernel from the communication path, while still providing full protection. The model presented by U-Net allows for the construction of protocols at user level whose performance is only limited by the capabilities of network. The architecture is extremely flexible in the sense that traditional protocols like TCP and UDP, as well as novel abstractions like Active Messages can be implemented efficiently. A U-Net prototype on an 8node ATM cluster of standard workstations offers 65 microseconds round-trip latency and 15 Mbytes/sec bandwidth. It achieves TCP performance at maximum network bandwidth and demonstrates performance equivalent to Meiko CS-2 and TMC CM-5 supercomputers on a set of Split-C benchmarks.","PeriodicalId":168455,"journal":{"name":"Proceedings of the fifteenth ACM symposium on Operating systems principles","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127441713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Autonomous replication across wide-area internetworks","authors":"James Gwertzman, M. Seltzer","doi":"10.1145/224056.225836","DOIUrl":"https://doi.org/10.1145/224056.225836","url":null,"abstract":"The number of users connected to the Internet has been growing at an exponential rate, resulting in similar increases in network traffic and Internet server load. Advances in microprocessors and network technologies have kept up with growth so far, but we are reaching the limits of hardware solutions. In order for the Internet’s growth to continue, we must efficiently distribute server load and reduce the network traffic generated by its various services.","PeriodicalId":168455,"journal":{"name":"Proceedings of the fifteenth ACM symposium on Operating systems principles","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131594319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. H. Patterson, Garth A. Gibson, E. Ginting, Daniel Stodolsky, J. Zelenka
{"title":"Informed prefetching and caching","authors":"R. H. Patterson, Garth A. Gibson, E. Ginting, Daniel Stodolsky, J. Zelenka","doi":"10.1145/224056.224064","DOIUrl":"https://doi.org/10.1145/224056.224064","url":null,"abstract":"The underutilization of disk parallelism and file cache buffers by traditional file systems induces I/O stall time that degrades the performance of modern microprocessor-based systems. In this paper, we present aggressive mechanisms that tailor file system resource management to the needs of I/O-intensive applications. In particular, we show how to use application-disclosed access patterns (hints) to expose and exploit I/O parallelism and to allocate dynamically file buffers among three competing demands: prefetching hinted blocks, caching hinted blocks for reuse, and caching recently used data for unhinted accesses. Our approach estimates the impact of alternative buffer allocations on application execution time and applies a cost-benefit analysis to allocate buffers where they will have the greatest impact. We implemented informed prefetching and caching in DEC''s OSF/1 operating system and measured its performance on a 150 MHz Alpha equipped with 15 disks running a range of applications including text search, 3D scientific visualization, relational database queries, speech recognition, and computational chemistry. Informed prefetching reduces the execution time of the first four of these applications by 20% to 87%. Informed caching reduces the execution time of the fifth application by up to 30%.","PeriodicalId":168455,"journal":{"name":"Proceedings of the fifteenth ACM symposium on Operating systems principles","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126328203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object and native code thread mobility among heterogeneous computers","authors":"B. Steensgaard, E. Jul","doi":"10.1145/224056.224063","DOIUrl":"https://doi.org/10.1145/224056.224063","url":null,"abstract":"We present a technique for moving objects and threads among heterogeneous computers at the native code level. To enable mobility of threads running native code. we convert thread states among machine-dependent and machine-independent formats. We introduce the concept of bus stops, which are machine-independent representations of program points as represented by program counter values. The concept of bus stops can be used also for other purposes. e.g. to aid inspecting and debugging optimized code, garbage collection etc. We also discuss techniques for thread mobility among processors executing differently optimized codes. We demonstrate the viability of our ideas by providing a prototype implementation of object and thread mobility among heterogeneous computers. The prototype uses the Emerald distributed programming language without modification ; we have merely extended the Emerald runtime system and the code generator of the Emerald compiler. Our extensions allow object and thread mobility among VAX, Sun-3, HP9000/300, and Sun SPARC workstations. The excellent intra-node performance of the original homogeneous Emerald is retained : migrated threads run at native code speed before and after migration ; the same speed as on homogeneous Emerald and close to C code performance. Our implementation of mobility has not been optimized : thread mobility and trans-architecture invocations take about 60% longer than in the homogeneous implementation. We believe this is the first implementation of full object and thread mobility among heterogeneous computers with threads executing native code.","PeriodicalId":168455,"journal":{"name":"Proceedings of the fifteenth ACM symposium on Operating systems principles","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116232755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new page table for 64-bit address spaces","authors":"Madhusudhan Talluri, M. Hill, Y. Khalidi","doi":"10.1145/224056.224071","DOIUrl":"https://doi.org/10.1145/224056.224071","url":null,"abstract":"Most computer architectures are moving to 64-bit virtual address spaces. We first discuss how this change impacts conventional linear, forward-mapped, and hashed page tables. We then introduce a new page table data structure-clustered page table-that can be viewed as a hashed page table augmented with subblocking. Specifically, it associates mapping information for several pages (e.g., sixteen) with a single virtual tag and next pointer. Simulation results with several workloads show that clustered page tables use less memory than alternatives without adversely affecting page table access time. Since physical address space use is also increasing, computer architects are using new techniques-such as superpages, complete-subblocking, and partial-subblocking-to increase the memory mapped by a translation lookaside buffer (TLB). Since these techniques are completely ineffective without page table support, we next look at extending conventional and clustered page tables to support them. Simulation results show clustered page tables support medium-sized superpage and subblock TLBs especially well.","PeriodicalId":168455,"journal":{"name":"Proceedings of the fifteenth ACM symposium on Operating systems principles","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129246113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A highly available scalable ITV system","authors":"M. Nelson, M. Linton, S. Owicki","doi":"10.1145/224056.224062","DOIUrl":"https://doi.org/10.1145/224056.224062","url":null,"abstract":"As part of Time Warner's interactive TV trial in Orlando, Florida, we have implemented mechanisms for the construction of highly available and scalable system services and applications. Our mechanisms rely on an underlying distributed objects architecture, similar to Spring[1]. We have extended a standard name service interface to provide selectors for choosing among service replicas and auditing to allow the automatic detection and removal of unresponsive objects from the name space. In addition, our system supports resource recovery, by letting servers detect client failures, and automated restart of failed services. Our experience has been that these mechanisms greatly simplify the development of services that are both highly available and scalable. The system was built in less than 15 months, is currently in a small number of homes, and will support the trial's 4,000 users later this year.","PeriodicalId":168455,"journal":{"name":"Proceedings of the fifteenth ACM symposium on Operating systems principles","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129516355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}