IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)最新文献_第2页

A dependability analysis of hardware-assisted polling integrity checking systems 硬件辅助轮询完整性检查系统的可靠性分析

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI: 10.1109/DSN.2012.6263962

Jiang Wang, Kun Sun, A. Stavrou

引用次数: 8

Characterization of the error resiliency of power grid substation devices 电网变电所设备的误差弹性特性

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI: 10.1109/DSN.2012.6263924

Kuan-Yu Tseng, Daniel Chen, Z. Kalbarczyk, R. Iyer

引用次数: 9

Automatic fault characterization via abnormality-enhanced classification 通过异常增强分类自动故障表征

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI: 10.1109/DSN.2012.6263926

G. Bronevetsky, I. Laguna, B. Supinski, S. Bagchi

{"title":"Automatic fault characterization via abnormality-enhanced classification","authors":"G. Bronevetsky, I. Laguna, B. Supinski, S. Bagchi","doi":"10.1109/DSN.2012.6263926","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263926","url":null,"abstract":"Enterprise and high-performance computing systems are growing extremely large and complex, employing many processors and diverse software/hardware stacks. As these machines grow in scale, faults become more frequent and system complexity makes it difficult to detect and to diagnose them. The difficulty is particularly large for faults that degrade system performance or cause erratic behavior but do not cause outright crashes. The cost of these errors is high since they significantly reduce system productivity, both initially and by time required to resolve them. Current system management techniques do not work well since they require manual examination of system behavior and do not identify root causes. When a fault is manifested, system administrators need timely notification about the type of fault, the time period in which it occurred and the processor on which it originated. Statistical modeling approaches can accurately characterize normal and abnormal system behavior. However, the complex effects of system faults are less amenable to these techniques. This paper demonstrates that the complexity of system faults makes traditional classification and clustering algorithms inadequate for characterizing them. We design novel techniques that combine classification algorithms with information on the abnormality of application behavior to improve detection and characterization accuracy significantly. Our experiments demonstrate that our techniques can detect and characterize faults with 85% accuracy, compared to just 12% accuracy for direct applications of traditional techniques.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114954299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

High-performance parallel accelerator for flexible and efficient run-time monitoring 高性能并行加速器，用于灵活高效的运行时监控

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI: 10.1109/DSN.2012.6263925

Daniel Y. Deng, G. Suh

引用次数: 36

Time-efficient and cost-effective network hardening using attack graphs 使用攻击图的时间效率和成本效益的网络加固

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI: 10.1109/DSN.2012.6263942

Massimiliano Albanese, S. Jajodia, S. Noel

引用次数: 111

Keep net working - on a dependable and fast networking stack 保持网络-在一个可靠和快速的网络堆栈

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI: 10.1109/DSN.2012.6263933

Tomás Hrubý, Dirk Vogt, H. Bos, A. Tanenbaum

{"title":"Keep net working - on a dependable and fast networking stack","authors":"Tomás Hrubý, Dirk Vogt, H. Bos, A. Tanenbaum","doi":"10.1109/DSN.2012.6263933","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263933","url":null,"abstract":"For many years, multiserver1 operating systems have been demonstrating, by their design, high dependability and reliability. However, the design has inherent performance implications which were not easy to overcome. Until now the context switching and kernel involvement in the message passing was the performance bottleneck for such systems to get broader acceptance beyond niche domains. In contrast to other areas of software development where fitting the software to the parallelism is difficult, the new multicore hardware is a great match for the multiserver systems. We can run individual servers on different cores. This opens more room for further decomposition of the existing servers and thus improving dependability and live-updatability. We discuss in general the implications for the multiserver systems design and cover in detail the implementation and evaluation of a more dependable networking stack. We split the single stack into multiple servers which run on dedicated cores and communicate without kernel involvement. We think that the performance problems that have dogged multiserver operating systems since their inception should be reconsidered: it is possible to make multiserver systems fast on multicores.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129198123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Lightweight cooperative logging for fault replication in concurrent programs 用于并发程序中错误复制的轻量级协作日志记录

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI: 10.1109/DSN.2012.6263953

Nuno Machado, P. Romano, L. Rodrigues

引用次数: 15

Heuristics for optimizing matrix-based erasure codes for fault-tolerant storage systems 用于容错存储系统的基于矩阵的纠删码优化启发式算法

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI: 10.1109/DSN.2012.6263937

J. Plank, Catherine D. Schuman, B. D. Robison

引用次数: 22

Epiphany: A location hiding architecture for protecting critical services from DDoS attacks 顿悟:用于保护关键服务免受DDoS攻击的位置隐藏架构

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI: 10.1109/DSN.2012.6263945

Vamsi Kambhampati, C. Papadopoulos, D. Massey

引用次数: 11

Low-cost program-level detectors for reducing silent data corruptions 用于减少静默数据损坏的低成本程序级检测器

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI: 10.1109/DSN.2012.6263960

S. Hari, S. Adve, Helia Naeimi

{"title":"Low-cost program-level detectors for reducing silent data corruptions","authors":"S. Hari, S. Adve, Helia Naeimi","doi":"10.1109/DSN.2012.6263960","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263960","url":null,"abstract":"With technology scaling, transient faults are becoming an increasing threat to hardware reliability. Commodity systems must be made resilient to these in-field faults through very low-cost resiliency solutions. Software-level symptom detection techniques have emerged as promising low-cost and effective solutions. While the current user-visible Silent Data Corruption (SDC) rates for these techniques is relatively low, eliminating or significantly lowering the SDC rate is crucial for these solutions to become practically successful. Identifying and understanding program sections that cause SDCs is crucial to reducing (or eliminating) SDCs in a cost effective manner. This paper provides a detailed analysis of code sections that produce over 90% of SDCs for six applications we studied. This analysis facilitated the development of program-level detectors that catch errors in quantities that are either accumulated or active for a long duration, amortizing the detection costs. These low-cost detectors significantly reduce the dependency on redundancy-based techniques and provide more practical and flexible choice points on the performance vs. reliability trade-off curve. For example, for an average of 90%, 99%, or 100% reduction of the baseline SDC rate, the average execution overheads of our approach versus redundancy alone are respectively 12% vs. 30%, 19% vs. 43%, and 27% vs. 51%.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132130404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 137