ACM Transactions on Computer Systems最新文献_第8页

Selective replication: A lightweight technique for soft errors 选择性复制:针对软错误的轻量级技术

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2009-12-01 DOI: 10.1145/1658357.1658359

X. Vera, J. Abella, J. Carretero, Antonio González

{"title":"Selective replication: A lightweight technique for soft errors","authors":"X. Vera, J. Abella, J. Carretero, Antonio González","doi":"10.1145/1658357.1658359","DOIUrl":"https://doi.org/10.1145/1658357.1658359","url":null,"abstract":"Soft errors are an important challenge in contemporary microprocessors. Modern processors have caches and large memory arrays protected by parity or error detection and correction codes. However, today's failure rate is dominated by flip flops, latches, and the increasing sensitivity of combinational logic to particle strikes. Moreover, as Chip Multi-Processors (CMPs) become ubiquitous, meeting the FIT budget for new designs is becoming a major challenge.\u0000 Solutions based on replicating threads have been explored deeply; however, their high cost in performance and energy make them unsuitable for current designs. Moreover, our studies based on a typical configuration for a modern processor show that focusing on the top 5 most vulnerable structures can provide up to 70% reduction in FIT rate. Therefore, full replication may overprotect the chip by reducing the FIT much below budget.\u0000 We propose Selective Replication, a lightweight-reconfigurable mechanism that achieves a high FIT reduction by protecting the most vulnerable instructions with minimal performance and energy impact. Low performance degradation is achieved by not requiring additional issue slots and reissuing instructions only during the time window between when they are retirable and they actually retire. Coverage can be reconfigured online by replicating only a subset of the instructions (the most vulnerable ones). Instructions' vulnerability is estimated based on the area they occupy and the time they spend in the issue queue. By changing the vulnerability threshold, we can adjust the trade-off between coverage and performance loss.\u0000 Results for an out-of-order processor configured similarly to Intel® Core#8482; Micro-Architecture show that our scheme can achieve over 65% FIT reduction with less than 4% performance degradation with small area and complexity overhead.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"40 1","pages":"8:1-8:30"},"PeriodicalIF":1.5,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80311933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Sinfonia: A new paradigm for building scalable distributed systems Sinfonia:构建可扩展分布式系统的新范例

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2009-11-01 DOI: 10.1145/1629087.1629088

M. Aguilera, A. Merchant, Mehul A. Shah, Alistair C. Veitch, C. Karamanolis

引用次数: 114

Automated anomaly detection and performance modeling of enterprise applications 企业应用程序的自动异常检测和性能建模

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2009-11-01 DOI: 10.1145/1629087.1629089

L. Cherkasova, K. Ozonat, N. Mi, J. Symons, E. Smirni

{"title":"Automated anomaly detection and performance modeling of enterprise applications","authors":"L. Cherkasova, K. Ozonat, N. Mi, J. Symons, E. Smirni","doi":"10.1145/1629087.1629089","DOIUrl":"https://doi.org/10.1145/1629087.1629089","url":null,"abstract":"Automated tools for understanding application behavior and its changes during the application lifecycle are essential for many performance analysis and debugging tasks. Application performance issues have an immediate impact on customer experience and satisfaction. A sudden slowdown of enterprise-wide application can effect a large population of customers, lead to delayed projects, and ultimately can result in company financial loss. Significantly shortened time between new software releases further exacerbates the problem of thoroughly evaluating the performance of an updated application. Our thesis is that online performance modeling should be a part of routine application monitoring. Early, informative warnings on significant changes in application performance should help service providers to timely identify and prevent performance problems and their negative impact on the service. We propose a novel framework for automated anomaly detection and application change analysis. It is based on integration of two complementary techniques: (i) a regression-based transaction model that reflects a resource consumption model of the application, and (ii) an application performance signature that provides a compact model of runtime behavior of the application. The proposed integrated framework provides a simple and powerful solution for anomaly detection and analysis of essential performance changes in application behavior. An additional benefit of the proposed approach is its simplicity: It is not intrusive and is based on monitoring data that is typically available in enterprise production environments. The introduced solution further enables the automation of capacity planning and resource provisioning tasks of multitier applications in rapidly evolving IT environments.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"38 1","pages":"6:1-6:32"},"PeriodicalIF":1.5,"publicationDate":"2009-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88196143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 89

Practical and low-overhead masking of failures of TCP-based servers 实用且低开销的基于tcp的服务器故障屏蔽

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2009-05-01 DOI: 10.1145/1534909.1534911

D. Zagorodnov, K. Marzullo, L. Alvisi, T. Bressoud

引用次数: 28

Distributed hash sketches: Scalable, efficient, and accurate cardinality estimation for distributed multisets 分布式散列草图:分布式多集的可伸缩、高效和准确的基数估计

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2009-02-01 DOI: 10.1145/1482619.1482621

Nikos Ntarmos, P. Triantafillou, G. Weikum

{"title":"Distributed hash sketches: Scalable, efficient, and accurate cardinality estimation for distributed multisets","authors":"Nikos Ntarmos, P. Triantafillou, G. Weikum","doi":"10.1145/1482619.1482621","DOIUrl":"https://doi.org/10.1145/1482619.1482621","url":null,"abstract":"Counting items in a distributed system, and estimating the cardinality of multisets in particular, is important for a large variety of applications and a fundamental building block for emerging Internet-scale information systems. Examples of such applications range from optimizing query access plans in peer-to-peer data sharing, to computing the significance (rank/score) of data items in distributed information retrieval. The general formal problem addressed in this article is computing the network-wide distinct number of items with some property (e.g., distinct files with file name containing “spiderman”) where each node in the network holds an arbitrary subset, possibly overlapping the subsets of other nodes. The key requirements that a viable approach must satisfy are: (1) scalability towards very large network size, (2) efficiency regarding messaging overhead, (3) load balance of storage and access, (4) accuracy of the cardinality estimation, and (5) simplicity and easy integration in applications. This article contributes the DHS (Distributed Hash Sketches) method for this problem setting: a distributed, scalable, efficient, and accurate multiset cardinality estimator. DHS is based on hash sketches for probabilistic counting, but distributes the bits of each counter across network nodes in a judicious manner based on principles of Distributed Hash Tables, paying careful attention to fast access and aggregation as well as update costs. The article discusses various design choices, exhibiting tunable trade-offs between estimation accuracy, hop-count efficiency, and load distribution fairness. We further contribute a full-fledged, publicly available, open-source implementation of all our methods, and a comprehensive experimental evaluation for various settings.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"30 1","pages":"2:1-2:53"},"PeriodicalIF":1.5,"publicationDate":"2009-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90480440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Hill-climbing SMT processor resource distribution 爬坡式SMT处理器资源分布

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2009-02-01 DOI: 10.1145/1482619.1482620

Seungryul Choi, D. Yeung

{"title":"Hill-climbing SMT processor resource distribution","authors":"Seungryul Choi, D. Yeung","doi":"10.1145/1482619.1482620","DOIUrl":"https://doi.org/10.1145/1482619.1482620","url":null,"abstract":"The key to high performance in Simultaneous MultiThreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential performance bottlenecks by observing indicators, like instruction occupancy or cache miss counts, and take actions to try to alleviate them. While the corrective actions are designed to improve performance, their actual performance impact is not known since end performance is never monitored. Consequently, potential performance gains are lost whenever the corrective actions do not effectively address the actual bottlenecks occurring in the pipeline.\u0000 We propose a different approach to SMT resource distribution that optimizes end performance directly. Our approach observes the impact that resource distribution decisions have on performance at runtime, and feeds this information back to the resource distribution mechanisms to improve future decisions. By evaluating many different resource distributions, our approach tries to learn the best distribution over time. Because we perform learning online, learning time is crucial. We develop a hill-climbing algorithm that quickly learns the best distribution of resources by following the performance gradient within the resource distribution space. We also develop several ideal learning algorithms to enable deeper insights through limit studies.\u0000 This article conducts an in-depth investigation of hill-climbing SMT resource distribution using a comprehensive suite of 63 multiprogrammed workloads. Our results show hill-climbing outperforms ICOUNT, FLUSH, and DCRA (three existing SMT techniques) by 11.4%, 11.5%, and 2.8%, respectively, under the weighted IPC metric. A limit study conducted using our ideal learning algorithms shows our approach can potentially outperform the same techniques by 19.2%, 18.0%, and 7.6%, respectively, thus demonstrating additional room exists for further improvement. Using our ideal algorithms, we also identify three bottlenecks that limit online learning speed: local maxima, phased behavior, and interepoch jitter. We define metrics to quantify these learning bottlenecks, and characterize the extent to which they occur in our workloads. Finally, we conduct a sensitivity study, and investigate several extensions to improve our hill-climbing technique.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"22 1","pages":"1:1-1:47"},"PeriodicalIF":1.5,"publicationDate":"2009-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85472037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Improving peer-to-peer performance through server-side scheduling 通过服务器端调度提高点对点性能

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2008-12-01 DOI: 10.1145/1455258.1455260

Y. Qiao, F. Bustamante, P. Dinda, S. Birrer, Dong Lu

引用次数: 8

Adaptive work-stealing with parallelism feedback 具有并行反馈的自适应工作窃取

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2008-09-01 DOI: 10.1145/1394441.1394443

Kunal Agrawal, C. Leiserson, Yuxiong He, W. Hsu

{"title":"Adaptive work-stealing with parallelism feedback","authors":"Kunal Agrawal, C. Leiserson, Yuxiong He, W. Hsu","doi":"10.1145/1394441.1394443","DOIUrl":"https://doi.org/10.1145/1394441.1394443","url":null,"abstract":"Multiprocessor scheduling in a shared multiprogramming environment can be structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler schedules the work of a job on its allotted processors. We present a randomized work-stealing thread scheduler for fork-join multithreaded jobs that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our A-STEAL algorithm is appropriate for large parallel servers where many jobs share a common multiprocessor resource and in which the number of processors available to a particular job may vary during the job's execution. Assuming that the job scheduler never allots a job more processors than requested by the job's thread scheduler, A-STEAL guarantees that the job completes in near-optimal time while utilizing at least a constant fraction of the allotted processors.\u0000 We model the job scheduler as the thread scheduler's adversary, challenging the thread scheduler to be robust to the operating environment as well as to the job scheduler's administrative policies. For example, the job scheduler might make a large number of processors available exactly when the job has little use for them. To analyze the performance of our adaptive thread scheduler under this stringent adversarial assumption, we introduce a new technique called trim analysis, which allows us to prove that our thread scheduler performs poorly on no more than a small number of time steps, exhibiting near-optimal behavior on the vast majority.\u0000 More precisely, suppose that a job has work T1 and span T∞. On a machine with P processors, A-STEAL completes the job in an expected duration of O(T1/&Ptilde; + T∞ + L lg P) time steps, where L is the length of a scheduling quantum, and &Ptilde; denotes the O(T∞ + L lg P)-trimmed availability. This quantity is the average of the processor availability over all time steps except the O(T∞ + L lg P) time steps that have the highest processor availability. When the job's parallelism dominates the trimmed availability, that is, &Ptilde; ≪ T1/T∞, the job achieves nearly perfect linear speedup. Conversely, when the trimmed mean dominates the parallelism, the asymptotic running time of the job is nearly the length of its span, which is optimal.\u0000 We measured the performance of A-STEAL on a simulated multiprocessor system using synthetic workloads. For jobs with sufficient parallelism, our experiments confirm that A-STEAL provides almost perfect linear speedup across a variety of processor availability profiles. We compared A-STEAL with the ABP algorithm, an adaptive work-stealing thread scheduler developed by Arora et al. [1998] which does not employ parallelism feedback. On moderately to heavily loaded machines with large numbers of processors, A-STEAL typically completed jobs more than twice as quickly as ABP, despite being allotted the same number or fewer processors on every step, while wasting only 10%","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"160 38 1","pages":"7:1-7:32"},"PeriodicalIF":1.5,"publicationDate":"2008-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89128278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

A stateless approach to connection-oriented protocols 面向连接协议的无状态方法

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2008-09-01 DOI: 10.1145/1394441.1394444

Alan Shieh, A. Myers, E. G. Sirer

{"title":"A stateless approach to connection-oriented protocols","authors":"Alan Shieh, A. Myers, E. G. Sirer","doi":"10.1145/1394441.1394444","DOIUrl":"https://doi.org/10.1145/1394441.1394444","url":null,"abstract":"Traditional operating system interfaces and network protocol implementations force some system state to be kept on both sides of a connection. This state ties the connection to its endpoints, impedes transparent failover, permits denial-of-service attacks, and limits scalability. This article introduces a novel TCP-like transport protocol and a new interface to replace sockets that together enable all state to be kept on one endpoint, allowing the other endpoint, typically the server, to operate without any per-connection state. Called Trickles, this approach enables servers to scale well with increasing numbers of clients, consume fewer resources, and better resist denial-of-service attacks. Measurements on a full implementation in Linux indicate that Trickles achieves performance comparable to TCP/IP, interacts well with other flows, and scales well. Trickles also enables qualitatively different kinds of networked services. Services can be geographically replicated and contacted through an anycast primitive for improved availability and performance. Widely-deployed practices that currently have client-observable side effects, such as periodic server reboots, connection redirection, and failover, can be made transparent, and perform well, under Trickles. The protocol is secure against tampering and replay attacks, and the client interface is backward-compatible, requiring no changes to sockets-based client applications.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"9 1","pages":"8:1-8:50"},"PeriodicalIF":1.5,"publicationDate":"2008-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87129931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

RaWMS - Random Walk Based Lightweight Membership Service for Wireless Ad Hoc Networks RaWMS——无线自组织网络中基于随机游动的轻量级会员服务

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2008-06-01 DOI: 10.1145/1365815.1365817

Ziv Bar-Yossef, R. Friedman, G. Kliot

引用次数: 82