International Conference on Dependable Systems and Networks (DSN'06)最新文献

筛选
英文 中文
A Contribution Towards Solving the Web Workload Puzzle 对解决Web工作负载难题的贡献
International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.2
K. Goseva-Popstojanova, Fengbin Li, Xuan Wang, A. Sangle
{"title":"A Contribution Towards Solving the Web Workload Puzzle","authors":"K. Goseva-Popstojanova, Fengbin Li, Xuan Wang, A. Sangle","doi":"10.1109/DSN.2006.2","DOIUrl":"https://doi.org/10.1109/DSN.2006.2","url":null,"abstract":"World Wide Web, the biggest distributed system ever built, experiences tremendous growth and change in Web sites, users, and technology. A realistic and accurate characterization of Web workload is the first, fundamental step in areas such as performance analysis and prediction, capacity planning, and admission control. Compared to the previous work, in this paper we present more detailed and rigorous statistical analysis of both request and session level characteristics of Web workload based on empirical data extracted from actual logs of four Web servers. Our analysis is focused on exploring phenomena such as self-similarity, long-range dependence, and heavy-tailed distributions. Identification of these phenomena in real data is a challenging task since the existing methods may perform erratically in practice and produce misleading results. We provide more accurate analysis of long-range dependence of the request and session arrival processes by removing the trend and periodicity. In addition to the session arrival process (i.e., inter-session characteristics), we study several intra-session characteristics using several different methods to test the existence of heavy-tailed behavior and cross validate the results. Finally, we point out specific problems associated with the methods used for establishing long-range dependence and heavy-tailed behavior of Web workloads. We believe that the comprehensive model presented in this paper is a step towards solving the Web workload puzzle","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114567216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
BlueGene/L Failure Analysis and Prediction Models BlueGene/L失效分析与预测模型
International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.18
Yinglung Liang, Yanyong Zhang, A. Sivasubramaniam, M. Jette, R. Sahoo
{"title":"BlueGene/L Failure Analysis and Prediction Models","authors":"Yinglung Liang, Yanyong Zhang, A. Sivasubramaniam, M. Jette, R. Sahoo","doi":"10.1109/DSN.2006.18","DOIUrl":"https://doi.org/10.1109/DSN.2006.18","url":null,"abstract":"The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM's BlueGene/L which can accommodate as many as 128 K processors. One of the challenges when designing and deploying these systems in a production setting is the need to take failure occurrences, whether it be in the hardware or in the software, into account. Earlier work has shown that conventional runtime fault-tolerant techniques such as periodic checkpointing are not effective to the emerging systems. Instead, the ability to predict failure occurrences can help develop more effective checkpointing strategies. Failure prediction has long been regarded as a challenging research problem, mainly due to the lack of realistic failure data from actual production systems. In this study, we have collected RAS event logs from BlueGene/L over a period of more than 100 days. We have investigated the characteristics of fatal failure events, as well as the correlation between fatal events and non-fatal events. Based on the observations, we have developed three simple yet effective failure prediction methods, which can predict around 80% of the memory and network failures, and 47% of the application I/O failures","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128929129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 292
A Performance Study on the Signal-On-Fail Approach to Imposing Total Order in the Streets of Byzantium 拜占庭街道强制总秩序的故障信号法绩效研究
International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.7
Qurat-ul-Ain Inayat, P. Ezhilchelvan
{"title":"A Performance Study on the Signal-On-Fail Approach to Imposing Total Order in the Streets of Byzantium","authors":"Qurat-ul-Ain Inayat, P. Ezhilchelvan","doi":"10.1109/DSN.2006.7","DOIUrl":"https://doi.org/10.1109/DSN.2006.7","url":null,"abstract":"Any asynchronous total-order protocol must somehow circumvent the well-known FLP impossibility result. This paper exposes the performance gains obtained when this impossibility is dealt with through the use of abstract processes built to have some special failure semantics. Specifically, we build processes with signal-on-fail semantics by (i) having a subset of Byzantine-prone processes paired to check each other's computational outputs, and (ii) assuming that paired processes do not fail simultaneously. By dynamically invoking the construction of signal-on-fail processes, coordinator-based total-order protocols which allow less than one-third of processes to fail in a Byzantine manner are developed. Using a LAN-based implementation, failure-free order latencies and fail-over latencies are measured; the former are shown to be smaller compared to the protocol of Castro and Liskov which is generally regarded to perform exceedingly well in the best-case scenarios","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125479659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Collecting and Analyzing Failure Data of Bluetooth Personal Area Networks 蓝牙个人区域网络故障数据的采集与分析
International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.20
M. Cinque, Domenico Cotroneo, S. Russo
{"title":"Collecting and Analyzing Failure Data of Bluetooth Personal Area Networks","authors":"M. Cinque, Domenico Cotroneo, S. Russo","doi":"10.1109/DSN.2006.20","DOIUrl":"https://doi.org/10.1109/DSN.2006.20","url":null,"abstract":"This work presents a failure data analysis campaign on Bluetooth personal area networks (PANs) conducted on two kind of heterogeneous testbeds (working for more than one year). The obtained results reveal how failures distribution is characterized and suggest how to improve the dependability of Bluetooth PANs. Specifically, we define the failure model and we then identify the most effective recovery actions and masking strategies that can be adopted for each failure. We then integrate the discovered recovery actions and masking strategies in our testbeds, improving the availability and the reliability of 3.64% (up to 36.6%) and 202% (referred to the mean time to failure), respectively","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"24 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131991352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Performance Assurance via Software Rejuvenation: Monitoring, Statistics and Algorithms 通过软件复兴实现性能保证:监控、统计和算法
International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.58
Alberto Avritzer, A. Bondi, Michael Grottke, Kishor S. Trivedi, E. Weyuker
{"title":"Performance Assurance via Software Rejuvenation: Monitoring, Statistics and Algorithms","authors":"Alberto Avritzer, A. Bondi, Michael Grottke, Kishor S. Trivedi, E. Weyuker","doi":"10.1109/DSN.2006.58","DOIUrl":"https://doi.org/10.1109/DSN.2006.58","url":null,"abstract":"We present three algorithms for detecting the need for software rejuvenation by monitoring the changing values of a customer-affecting performance metric, such as response time. Applying these algorithms can improve the values of this customer-affecting metric by triggering rejuvenation before performance degradation becomes severe. The algorithms differ in the way they gather and use sample values to arrive at a rejuvenation decision. Their effectiveness is evaluated for different sets of control parameters, including sample size, using simulation. The results show that applying the algorithms with suitable choices of control parameters can significantly improve system performance as measured by the response time","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123895401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Evaluating the Performability of Systems with Background Jobs 评估具有后台作业的系统的可执行性
International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.33
Qi Zhang, N. Mi, E. Smirni, Alma Riska, E. Riedel
{"title":"Evaluating the Performability of Systems with Background Jobs","authors":"Qi Zhang, N. Mi, E. Smirni, Alma Riska, E. Riedel","doi":"10.1109/DSN.2006.33","DOIUrl":"https://doi.org/10.1109/DSN.2006.33","url":null,"abstract":"As most computer systems are expected to remain operational 24 hours a day, 7 days a week, they must complete maintenance work while in operation. This work is in addition to the regular tasks of the system and its purpose is to improve system reliability and availability. Nonetheless, additional work in the system, although labeled as best effort or low priority, still affects the performance of foreground tasks, especially if background/foreground work is non-preemptive. In this paper, we propose an analytic model to evaluate the performance trade-offs of the amount of background work that a storage system can sustain. The proposed model results in a quasi-birth-death (QBD) process that is analytically tractable. Detailed experimentation using a variety of workloads shows that under dependent arrivals both foreground and background performance strongly depends on system load. In contrast, if arrivals of foreground jobs are independent, performance sensitivity to load is reduced. The model identifies dependence in the arrivals of foreground jobs as an important characteristic that controls the decision of how much background load the system can accept to maintain high availability and performance gains","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128700652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Secure Split Assignment Trajectory Sampling: A Malicious Router Detection System 安全分割分配轨迹抽样:一种恶意路由器检测系统
International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.64
Sihyung Lee, Tina Wong, Hyong S. Kim
{"title":"Secure Split Assignment Trajectory Sampling: A Malicious Router Detection System","authors":"Sihyung Lee, Tina Wong, Hyong S. Kim","doi":"10.1109/DSN.2006.64","DOIUrl":"https://doi.org/10.1109/DSN.2006.64","url":null,"abstract":"Routing infrastructure plays a vital role in the Internet, and attacks on routers can be damaging. Compromised routers can drop, modify, mis-forward or reorder valid packets. Existing proposals for secure forwarding require substantial computational overhead and additional capabilities at routers. We propose secure split assignment trajectory sampling (SATS), a system that detects malicious routers on the data plane. SATS locates a set of suspicious routers when packets do not follow their predicted paths. It works with a traffic measurement platform using packet sampling, has low overhead on routers and is applicable to high-speed networks. Different subsets of packets are sampled over different groups of routers to ensure that an attacker cannot completely evade detection. Our evaluation shows that SATS can significantly limit a malicious router's harm to a small portion of traffic in a network","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123767786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Efficiently Detecting All Dangling Pointer Uses in Production Servers 有效地检测生产服务器中所有悬空指针的使用
International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.31
Dinakar Dhurjati, Vikram S. Adve
{"title":"Efficiently Detecting All Dangling Pointer Uses in Production Servers","authors":"Dinakar Dhurjati, Vikram S. Adve","doi":"10.1109/DSN.2006.31","DOIUrl":"https://doi.org/10.1109/DSN.2006.31","url":null,"abstract":"In this paper, we propose a novel technique to detect all dangling pointer uses at run-time that is efficient enough for production use in server codes. One idea (previously used by electric fence, PageHeap) is to use a new virtual page for each allocation of the program and rely on page protection mechanisms to check dangling pointer accesses. This naive approach has two limitations that make it impractical to use in production software: increased physical memory usage and increased address space usage. We propose two key improvements that alleviate both these problems. First, we use a new virtual page for each allocation of the program but map it to the same physical page as the original allocator. This allows using nearly identical physical memory as the original program while still retaining the dangling pointer detection capability. We also show how to implement this idea without requiring any changes to the underlying memory allocator. Our second idea alleviates the problem of virtual address space exhaustion by using a previously developed compiler transformation called automatic pool allocation to reuse many virtual pages. The transformation partitions the memory of the program based on their lifetimes and allows us to reuse virtual pages when portions of memory become inaccessible. Experimentally we find that the run-time overhead for five Unix servers is less than 4%, for other Unix utilities less than 15%. However, in case of allocation intensive benchmarks, we find our overheads are much worse (up to 11x slowdown)","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134383686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
Improving BGP Convergence Delay for Large-Scale Failures 改进大规模故障的BGP收敛延迟
International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.41
A. Sahoo, K. Kant, P. Mohapatra
{"title":"Improving BGP Convergence Delay for Large-Scale Failures","authors":"A. Sahoo, K. Kant, P. Mohapatra","doi":"10.1109/DSN.2006.41","DOIUrl":"https://doi.org/10.1109/DSN.2006.41","url":null,"abstract":"Border gateway protocol (BGP) is the standard routing protocol used in the Internet for routing packets between the autonomous systems (ASes). It is known that BGP can take hundreds of seconds to converge after isolated failures. We have also observed that the convergence delay can be even greater for large-scale failures. In this study, we first investigate some of the factors affecting the convergence delay and their relative impacts. We observe that the minimum route advertisement interval (MRAI) and the processing overhead at the routers during the re-convergence have a significant effect on the BGP recovery time. We propose a couple of new schemes to reduce processing overload at BGP routers during large failures, which in turn leads to decreased convergence delays. We show that these schemes combined with the tuning of the MRAI value decrease the BGP convergence delay significantly, and can thus limit the impact of large scale failures in the Internet","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134122164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A Dependable System Architecture for Safety-Critical Respiratory-Gated Radiation Therapy 安全关键呼吸门控放射治疗的可靠系统架构
International Conference on Dependable Systems and Networks (DSN'06) Pub Date : 2006-06-25 DOI: 10.1109/DSN.2006.3
G. Sharp, Nagarajan Kandasamy
{"title":"A Dependable System Architecture for Safety-Critical Respiratory-Gated Radiation Therapy","authors":"G. Sharp, Nagarajan Kandasamy","doi":"10.1109/DSN.2006.3","DOIUrl":"https://doi.org/10.1109/DSN.2006.3","url":null,"abstract":"This experience report describes the design and implementation of safety-critical software and hardware for respiratory gating of a medical linear accelerator. Respiratory gating refers to a radiotherapy technique for treating cancer in the lung, liver, and abdomen, where tumors move while a patient breathes. A computer software program tracks the position of the tumor within the human body using X-ray fluoroscopy. When the tumor is in the correct position, the linear accelerator is triggered, delivering a beam of radiation toward the target. As part of the gating system, a comprehensive strategy for safety has been developed. This paper describes these safety features, focusing on the online monitoring techniques used to confirm the proper operation of the fluoroscopic imaging panels and the pattern recognition algorithms used for tumor identification","PeriodicalId":228470,"journal":{"name":"International Conference on Dependable Systems and Networks (DSN'06)","volume":"41 21","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113976087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信