{"title":"High-performance parallel accelerator for flexible and efficient run-time monitoring","authors":"Daniel Y. Deng, G. Suh","doi":"10.1109/DSN.2012.6263925","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263925","url":null,"abstract":"This paper proposes Harmoni, a high performance hardware accelerator architecture that can support a broad range of run-time monitoring and bookkeeping functions. Unlike custom hardware, which offers very little configurability after it has been fabricated, Harmoni is highly configurable and can allow a wide range of different hardware monitoring and bookkeeping functions to be dynamically added to a processing core even after the chip has already been fabricated. The Harmoni architecture achieves much higher efficiency than software implementations and previously proposed monitoring platforms by closely matching the common characteristics of run-time monitoring functions that are based on the notion of tagging. We implemented an RTL prototype of Harmoni and evaluated it with several example monitoring functions for security and programmability. The prototype demonstrates that the architecture can support a wide range of monitoring functions with different characteristics. Harmoni takes moderate silicon area, has very high throughput, and incurs low overheads on monitored programs.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"33 7-8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114013855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A dependability analysis of hardware-assisted polling integrity checking systems","authors":"Jiang Wang, Kun Sun, A. Stavrou","doi":"10.1109/DSN.2012.6263962","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263962","url":null,"abstract":"Due to performance constraints, host intrusion detection defenses depend on event and polling-based tamper-proof mechanisms to detect security breaches. These defenses monitor the state of critical software components in an attempt to discover any deviations from a pristine or expected state. The rate and type of checks depend can be both periodic and event-based, for instance triggered by hardware events. In this paper, we demonstrate that all software and hardware-assisted defenses that analyze non-contiguous state to infer intrusions are fundamentally vulnerable to a new class of attacks, we call “evasion attacks”. We detail two categories of evasion attacks: directly-intercepting the defense triggering mechanism and indirectly inferring its periodicity. We show that evasion attacks are applicable to a wide-range of protection mechanisms and we analyze their applicability in recent state-of-the-art hardware-assisted protection mechanisms. Finally, we quantify the performance of implemented proof-of-concept prototypes for all of the attacks and suggest potential countermeasures.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124785208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time-efficient and cost-effective network hardening using attack graphs","authors":"Massimiliano Albanese, S. Jajodia, S. Noel","doi":"10.1109/DSN.2012.6263942","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263942","url":null,"abstract":"Attack graph analysis has been established as a powerful tool for analyzing network vulnerability. However, previous approaches to network hardening look for exact solutions and thus do not scale. Further, hardening elements have been treated independently, which is inappropriate for real environments. For example, the cost for patching many systems may be nearly the same as for patching a single one. Or patching a vulnerability may have the same effect as blocking traffic with a firewall, while blocking a port may deny legitimate service. By failing to account for such hardening interdependencies, the resulting recommendations can be unrealistic and far from optimal. Instead, we formalize the notion of hardening strategy in terms of allowable actions, and define a cost model that takes into account the impact of interdependent hardening actions. We also introduce a near-optimal approximation algorithm that scales linearly with the size of the graphs, which we validate experimentally.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122969140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic fault characterization via abnormality-enhanced classification","authors":"G. Bronevetsky, I. Laguna, B. Supinski, S. Bagchi","doi":"10.1109/DSN.2012.6263926","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263926","url":null,"abstract":"Enterprise and high-performance computing systems are growing extremely large and complex, employing many processors and diverse software/hardware stacks. As these machines grow in scale, faults become more frequent and system complexity makes it difficult to detect and to diagnose them. The difficulty is particularly large for faults that degrade system performance or cause erratic behavior but do not cause outright crashes. The cost of these errors is high since they significantly reduce system productivity, both initially and by time required to resolve them. Current system management techniques do not work well since they require manual examination of system behavior and do not identify root causes. When a fault is manifested, system administrators need timely notification about the type of fault, the time period in which it occurred and the processor on which it originated. Statistical modeling approaches can accurately characterize normal and abnormal system behavior. However, the complex effects of system faults are less amenable to these techniques. This paper demonstrates that the complexity of system faults makes traditional classification and clustering algorithms inadequate for characterizing them. We design novel techniques that combine classification algorithms with information on the abnormality of application behavior to improve detection and characterization accuracy significantly. Our experiments demonstrate that our techniques can detect and characterize faults with 85% accuracy, compared to just 12% accuracy for direct applications of traditional techniques.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114954299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kuan-Yu Tseng, Daniel Chen, Z. Kalbarczyk, R. Iyer
{"title":"Characterization of the error resiliency of power grid substation devices","authors":"Kuan-Yu Tseng, Daniel Chen, Z. Kalbarczyk, R. Iyer","doi":"10.1109/DSN.2012.6263924","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263924","url":null,"abstract":"With the advent of modern technologies, microprocessor-based devices are used to monitor and control critical infrastructures, e.g., electric power grids, oil and gas distribution. However, the security and reliability of these microprocessor-based systems is a significant issue, since they are more susceptible to transient errors and malicious attacks. An error in one of these systems could have a cascading and catastrophic impact on the whole infrastructure. This paper explores the error resiliency of power grid substation devices. A software-implemented fault injection technique is used to induce errors/faults inside devices used in power grid substations. The goal is to test the ability of these systems to compute through errors/faults. Our results demonstrate that a single error in a substation device may render the operator in the control center unable to control the operation of a relay in the substation.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133268675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keep net working - on a dependable and fast networking stack","authors":"Tomás Hrubý, Dirk Vogt, H. Bos, A. Tanenbaum","doi":"10.1109/DSN.2012.6263933","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263933","url":null,"abstract":"For many years, multiserver1 operating systems have been demonstrating, by their design, high dependability and reliability. However, the design has inherent performance implications which were not easy to overcome. Until now the context switching and kernel involvement in the message passing was the performance bottleneck for such systems to get broader acceptance beyond niche domains. In contrast to other areas of software development where fitting the software to the parallelism is difficult, the new multicore hardware is a great match for the multiserver systems. We can run individual servers on different cores. This opens more room for further decomposition of the existing servers and thus improving dependability and live-updatability. We discuss in general the implications for the multiserver systems design and cover in detail the implementation and evaluation of a more dependable networking stack. We split the single stack into multiple servers which run on dedicated cores and communicate without kernel involvement. We think that the performance problems that have dogged multiserver operating systems since their inception should be reconsidered: it is possible to make multiserver systems fast on multicores.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129198123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight cooperative logging for fault replication in concurrent programs","authors":"Nuno Machado, P. Romano, L. Rodrigues","doi":"10.1109/DSN.2012.6263953","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263953","url":null,"abstract":"This paper presents CoopREP, a system that provides support for fault replication of concurrent programs, based on cooperative recording and partial log combination. CoopREP employs partial recording to reduce the amount of information that a given program instance is required to store in order to support deterministic replay. This allows to substantially reduce the overhead imposed by the instrumentation of the code, but raises the problem of finding the combination of logs capable of replaying the fault. CoopREP tackles this issue by introducing several innovative statistical analysis techniques aimed at guiding the search of partial logs to be combined and used during the replay phase. CoopREP has been evaluated using both standard benchmarks for multi-threaded applications and a real-world application. The results highlight that CoopREP can successfully replay concurrency bugs involving tens of thousands of memory accesses, reducing logging overhead with respect to state of the art non-cooperative logging schemes by up to 50 times in computationally intensive applications.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130386013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heuristics for optimizing matrix-based erasure codes for fault-tolerant storage systems","authors":"J. Plank, Catherine D. Schuman, B. D. Robison","doi":"10.1109/DSN.2012.6263937","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263937","url":null,"abstract":"Large scale, archival and wide-area storage systems use erasure codes to protect users from losing data due to the inevitable failures that occur. All but the most basic erasure codes employ bit-matrices so that encoding and decoding may be effected solely with the bitwise exclusive-OR (XOR) operation. There are CPU savings that can result from strategically scheduling these XOR operations so that fewer XOR's are performed. It is an open problem to derive a schedule from a bit-matrix that minimizes the number of XOR operations. We attack this open problem, deriving two new heuristics called Uber-CHRS and X-Sets to schedule encoding and decoding bit-matrices with reduced XOR operations. We evaluate these heuristics in a variety of realistic erasure coding settings and demonstrate that they are a significant improvement over previously published heuristics. We provide an open-source implementation of these heuristics so that practitioners may leverage our work.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114972173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Epiphany: A location hiding architecture for protecting critical services from DDoS attacks","authors":"Vamsi Kambhampati, C. Papadopoulos, D. Massey","doi":"10.1109/DSN.2012.6263945","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263945","url":null,"abstract":"Critical services operating over the Internet are increasingly threatened by Distributed Denial of Service (DDoS) attacks. To protect them we propose Epiphany, an architecture that hides the service IP addresses so that attackers cannot locate and target them. Epiphany provides service access through numerous lightweight proxies, presenting a wide target to the attacker. Epiphany has strong location hiding properties; no proxy knows the service address. Instead, proxies communicate over ephemeral paths controlled by the service. If a specific proxy misbehaves or is attacked it can be promptly removed. Epiphany separates proxies into setup and data, and only makes setup proxies public, but these use anycast to create distinct network regions. Clients in clean networks are not affected by attackers in other networks. Data proxies are assigned to clients based on their trust. We evaluate the defense properties of Epiphany using simulations and implementations on PlanetLab and a router testbed.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129634436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-cost program-level detectors for reducing silent data corruptions","authors":"S. Hari, S. Adve, Helia Naeimi","doi":"10.1109/DSN.2012.6263960","DOIUrl":"https://doi.org/10.1109/DSN.2012.6263960","url":null,"abstract":"With technology scaling, transient faults are becoming an increasing threat to hardware reliability. Commodity systems must be made resilient to these in-field faults through very low-cost resiliency solutions. Software-level symptom detection techniques have emerged as promising low-cost and effective solutions. While the current user-visible Silent Data Corruption (SDC) rates for these techniques is relatively low, eliminating or significantly lowering the SDC rate is crucial for these solutions to become practically successful. Identifying and understanding program sections that cause SDCs is crucial to reducing (or eliminating) SDCs in a cost effective manner. This paper provides a detailed analysis of code sections that produce over 90% of SDCs for six applications we studied. This analysis facilitated the development of program-level detectors that catch errors in quantities that are either accumulated or active for a long duration, amortizing the detection costs. These low-cost detectors significantly reduce the dependency on redundancy-based techniques and provide more practical and flexible choice points on the performance vs. reliability trade-off curve. For example, for an average of 90%, 99%, or 100% reduction of the baseline SDC rate, the average execution overheads of our approach versus redundancy alone are respectively 12% vs. 30%, 19% vs. 43%, and 27% vs. 51%.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132130404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}