International Conference on Dependable Systems and Networks (DSN'06)最新文献_第5页

A Contribution Towards Solving the Web Workload Puzzle 对解决Web工作负载难题的贡献

International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.2

K. Goseva-Popstojanova, Fengbin Li, Xuan Wang, A. Sangle

{"title":"A Contribution Towards Solving the Web Workload Puzzle","authors":"K. Goseva-Popstojanova, Fengbin Li, Xuan Wang, A. Sangle","doi":"10.1109/DSN.2006.2","DOIUrl":"https://doi.org/10.1109/DSN.2006.2","url":null,"abstract":"World Wide Web, the biggest distributed system ever built, experiences tremendous growth and change in Web sites, users, and technology. A realistic and accurate characterization of Web workload is the first, fundamental step in areas such as performance analysis and prediction, capacity planning, and admission control. Compared to the previous work, in this paper we present more detailed and rigorous statistical analysis of both request and session level characteristics of Web workload based on empirical data extracted from actual logs of four Web servers. Our analysis is focused on exploring phenomena such as self-similarity, long-range dependence, and heavy-tailed distributions. Identification of these phenomena in real data is a challenging task since the existing methods may perform erratically in practice and produce misleading results. We provide more accurate analysis of long-range dependence of the request and session arrival processes by removing the trend and periodicity. In addition to the session arrival process (i.e., inter-session characteristics), we study several intra-session characteristics using several different methods to test the existence of heavy-tailed behavior and cross validate the results. Finally, we point out specific problems associated with the methods used for establishing long-range dependence and heavy-tailed behavior of Web workloads. We believe that the comprehensive model presented in this paper is a step towards solving the Web workload puzzle","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114567216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

BlueGene/L Failure Analysis and Prediction Models BlueGene/L失效分析与预测模型

International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.18

Yinglung Liang, Yanyong Zhang, A. Sivasubramaniam, M. Jette, R. Sahoo

{"title":"BlueGene/L Failure Analysis and Prediction Models","authors":"Yinglung Liang, Yanyong Zhang, A. Sivasubramaniam, M. Jette, R. Sahoo","doi":"10.1109/DSN.2006.18","DOIUrl":"https://doi.org/10.1109/DSN.2006.18","url":null,"abstract":"The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM's BlueGene/L which can accommodate as many as 128 K processors. One of the challenges when designing and deploying these systems in a production setting is the need to take failure occurrences, whether it be in the hardware or in the software, into account. Earlier work has shown that conventional runtime fault-tolerant techniques such as periodic checkpointing are not effective to the emerging systems. Instead, the ability to predict failure occurrences can help develop more effective checkpointing strategies. Failure prediction has long been regarded as a challenging research problem, mainly due to the lack of realistic failure data from actual production systems. In this study, we have collected RAS event logs from BlueGene/L over a period of more than 100 days. We have investigated the characteristics of fatal failure events, as well as the correlation between fatal events and non-fatal events. Based on the observations, we have developed three simple yet effective failure prediction methods, which can predict around 80% of the memory and network failures, and 47% of the application I/O failures","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128929129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 292

A Performance Study on the Signal-On-Fail Approach to Imposing Total Order in the Streets of Byzantium 拜占庭街道强制总秩序的故障信号法绩效研究

International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.7

Qurat-ul-Ain Inayat, P. Ezhilchelvan

引用次数: 10

Collecting and Analyzing Failure Data of Bluetooth Personal Area Networks 蓝牙个人区域网络故障数据的采集与分析

International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.20

M. Cinque, Domenico Cotroneo, S. Russo

引用次数: 23

Performance Assurance via Software Rejuvenation: Monitoring, Statistics and Algorithms 通过软件复兴实现性能保证:监控、统计和算法

International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.58

Alberto Avritzer, A. Bondi, Michael Grottke, Kishor S. Trivedi, E. Weyuker

引用次数: 63

Evaluating the Performability of Systems with Background Jobs 评估具有后台作业的系统的可执行性

International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.33

Qi Zhang, N. Mi, E. Smirni, Alma Riska, E. Riedel

{"title":"Evaluating the Performability of Systems with Background Jobs","authors":"Qi Zhang, N. Mi, E. Smirni, Alma Riska, E. Riedel","doi":"10.1109/DSN.2006.33","DOIUrl":"https://doi.org/10.1109/DSN.2006.33","url":null,"abstract":"As most computer systems are expected to remain operational 24 hours a day, 7 days a week, they must complete maintenance work while in operation. This work is in addition to the regular tasks of the system and its purpose is to improve system reliability and availability. Nonetheless, additional work in the system, although labeled as best effort or low priority, still affects the performance of foreground tasks, especially if background/foreground work is non-preemptive. In this paper, we propose an analytic model to evaluate the performance trade-offs of the amount of background work that a storage system can sustain. The proposed model results in a quasi-birth-death (QBD) process that is analytically tractable. Detailed experimentation using a variety of workloads shows that under dependent arrivals both foreground and background performance strongly depends on system load. In contrast, if arrivals of foreground jobs are independent, performance sensitivity to load is reduced. The model identifies dependence in the arrivals of foreground jobs as an important characteristic that controls the decision of how much background load the system can accept to maintain high availability and performance gains","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128700652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Secure Split Assignment Trajectory Sampling: A Malicious Router Detection System 安全分割分配轨迹抽样:一种恶意路由器检测系统

International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.64

Sihyung Lee, Tina Wong, Hyong S. Kim

引用次数: 23

Efficiently Detecting All Dangling Pointer Uses in Production Servers 有效地检测生产服务器中所有悬空指针的使用

International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.31

Dinakar Dhurjati, Vikram S. Adve

{"title":"Efficiently Detecting All Dangling Pointer Uses in Production Servers","authors":"Dinakar Dhurjati, Vikram S. Adve","doi":"10.1109/DSN.2006.31","DOIUrl":"https://doi.org/10.1109/DSN.2006.31","url":null,"abstract":"In this paper, we propose a novel technique to detect all dangling pointer uses at run-time that is efficient enough for production use in server codes. One idea (previously used by electric fence, PageHeap) is to use a new virtual page for each allocation of the program and rely on page protection mechanisms to check dangling pointer accesses. This naive approach has two limitations that make it impractical to use in production software: increased physical memory usage and increased address space usage. We propose two key improvements that alleviate both these problems. First, we use a new virtual page for each allocation of the program but map it to the same physical page as the original allocator. This allows using nearly identical physical memory as the original program while still retaining the dangling pointer detection capability. We also show how to implement this idea without requiring any changes to the underlying memory allocator. Our second idea alleviates the problem of virtual address space exhaustion by using a previously developed compiler transformation called automatic pool allocation to reuse many virtual pages. The transformation partitions the memory of the program based on their lifetimes and allows us to reuse virtual pages when portions of memory become inaccessible. Experimentally we find that the run-time overhead for five Unix servers is less than 4%, for other Unix utilities less than 15%. However, in case of allocation intensive benchmarks, we find our overheads are much worse (up to 11x slowdown)","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134383686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 103

Improving BGP Convergence Delay for Large-Scale Failures 改进大规模故障的BGP收敛延迟

International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.41

A. Sahoo, K. Kant, P. Mohapatra

引用次数: 29

A Dependable System Architecture for Safety-Critical Respiratory-Gated Radiation Therapy 安全关键呼吸门控放射治疗的可靠系统架构

International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.3

G. Sharp, Nagarajan Kandasamy

引用次数: 4