{"title":"Adaptive monitoring in microkernel OSs","authors":"Domenico Cotroneo, Domenico Di Leo, R. Natella","doi":"10.1109/DSNW.2010.5542619","DOIUrl":"https://doi.org/10.1109/DSNW.2010.5542619","url":null,"abstract":"The microkernel architecture has been investigated by both industries and the academia for the development of dependable Operating Systems (OSs). This work copes with a relevant issue for this architecture, namely unresponsive components because of deadlocks and infinite loops. In particular, a monitor sends heartbeat messages to a component that should reply within a timeout. The timeout choice is tricky, since it should be dynamically adapted to the load conditions of the system. Therefore, our approach is based on an adaptive heartbeat mechanism, in which the timeout is estimated from past response times. We implement and compare three estimation algorithms for the choice of the timeout in the context of the Minix 3 OS. From the analysis we derive useful guidelines for choosing the best algorithm with respect to system requirements.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128033630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua Thompson, D. Dreisigmeyer, T. Jones, M. Kirby, Joshua Ladd
{"title":"Accurate fault prediction of BlueGene/P RAS logs via geometric reduction","authors":"Joshua Thompson, D. Dreisigmeyer, T. Jones, M. Kirby, Joshua Ladd","doi":"10.1109/DSNW.2010.5542626","DOIUrl":"https://doi.org/10.1109/DSNW.2010.5542626","url":null,"abstract":"This investigation presents two distinct and novel approaches for the prediction of system failures occurring in Oak Ridge National Laboratory's Blue Gene/P supercomputer. Each technique uses raw numeric and textual subsets of large data logs of physical system information such as fan speeds and CPU temperatures. This data is used to develop models of the system capable of sensing anomalies, or deviations from nominal behavior. Each algorithm predicted event log reported anomalies in advance of their occurrence and one algorithm did so without false positives. Both algorithms predicted an anomaly that did not appear in the event log. It was later learned that the fault missing from the log but predicted by both algorithms was confirmed to have occurred by the system administrator.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128053201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziming Zheng, Z. Lan, Rinku Gupta, S. Coghlan, P. Beckman
{"title":"A practical failure prediction with location and lead time for Blue Gene/P","authors":"Ziming Zheng, Z. Lan, Rinku Gupta, S. Coghlan, P. Beckman","doi":"10.1109/DSNW.2010.5542627","DOIUrl":"https://doi.org/10.1109/DSNW.2010.5542627","url":null,"abstract":"Analyzing, understanding and predicting failure is of paramount importance to achieve effective fault management. While various fault prediction methods have been studied in the past, many of them are not practical for use in real systems. In particular, they fail to address two crucial issues: one is to provide location information (i.e., the components where the failure is expected to occur on) and the other is to provide sufficient lead time (i.e., the time interval preceding the time of failure occurrence). In this paper, we first refine the widely-used metrics for evaluating prediction accuracy by including location as well as lead time. We, then, present a practical failure prediction mechanism for IBM Blue Gene systems. A Genetic Algorithm based method is exploited, which takes into consideration the location and the lead time for failure prediction. We demonstrate the effectiveness of this mechanism by means of real failure logs and job logs collected from the IBM Blue Gene/P system at Argonne National Laboratory. Our experiments show that the presented method can significantly improve fault management (e.g., to reduce service unit loss by up to 52.4%) by incorporating location and lead time information in the prediction.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115243573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Paljak, Zoltán Égel, D. Tóth, I. Kocsis, T. Kovácsházy, A. Pataricza
{"title":"Qualitative performance control in supervised IT infrastructures","authors":"G. Paljak, Zoltán Égel, D. Tóth, I. Kocsis, T. Kovácsházy, A. Pataricza","doi":"10.1109/DSNW.2010.5542618","DOIUrl":"https://doi.org/10.1109/DSNW.2010.5542618","url":null,"abstract":"Performability control of IT systems still lacks theoretically well-founded approaches that fit well to enterprise system management solutions. We propose a methodology for designing compact qualitative, state-based predictive performability control that use instrumentation provided by typical system monitoring frameworks. We identify the main systemic insufficiencies of current monitoring tools that hinder designing trustworthy fine-granular controls.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127228458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed object storage rebuild analysis via simulation with GOBS","authors":"J. Wozniak, S. Son, R. Ross","doi":"10.1109/DSNW.2010.5542624","DOIUrl":"https://doi.org/10.1109/DSNW.2010.5542624","url":null,"abstract":"Community acceptance of the object storage device model as represented by standards and use in existing HPC filesystems has enabled the development of more complex data storage systems. Object replicas may be placed in a variety of ways to obtain various properties, such as scalable lookup times, concurrent access to multiple objects, and efficient reorganization. The construction of a fully functional object-based parallel filesystem is an enormous effort, so evaluation of potential techniques and algorithms is typically performed by analysis or simulation. In this work, we present an extensible simulator designed to evaluate multiple object placement models under fault-induced rebuilds. We use results obtained by the simulator to weigh the benefits of simple object replica placement models.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128181332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rejuvenation with workload migration","authors":"R. Hanmer, V. Mendiratta","doi":"10.1109/DSNW.2010.5542617","DOIUrl":"https://doi.org/10.1109/DSNW.2010.5542617","url":null,"abstract":"A five-state model of software rejuvenation is introduced that divides the working state into three sub-states: working, vulnerable, and preparing. The preparing state models the period during which workload is drained from software elements that are about to be rejuvenated. We compared our model to a four-state model that only has working and vulnerable sub-states.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130148737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCIT and IDS architectures for reduced data ex-filtration","authors":"Ajay Nagarajan, A. Sood","doi":"10.1109/DSNW.2010.5542601","DOIUrl":"https://doi.org/10.1109/DSNW.2010.5542601","url":null,"abstract":"Today's approach to security is based on perimeter defense and relies heavily on firewalls, Intrusion detection systems (IDS) and Intrusion prevention systems. Despite years of research and investment in developing such reactive security methodologies, our critical systems remain vulnerable to cyber attacks. In our approach we assume that intrusions are inevitable and our effort is focused on minimizing losses. Towards this end we have introduced a recovery based limited exposure time system called Self Cleansing Intrusion Tolerance (SCIT). In this paper, we investigate architectures that combine SCIT architecture with existing IDS approaches. The effectiveness of SCIT and IDS security architectures in terms of minimizing data ex filtration losses is analyzed using decision trees and the results of Monte Carlo simulation is presented.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133516082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Survivability and information assurance in the cloud","authors":"Melvin Greer","doi":"10.1109/DSNW.2010.5542595","DOIUrl":"https://doi.org/10.1109/DSNW.2010.5542595","url":null,"abstract":"The threat landscape facing the Federal Government is growing, from underground cybercrime economy and burgeoning malware production to rumors of cyber war. Business leaders and security professionals focused on this threat landscape and evaluating cloud computing advantages also need to address cloud computing's unique survivability and information assurance risks.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127166695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast entropy based alert detection in super computer logs","authors":"A. Makanju, A. N. Zincir-Heywood, E. Milios","doi":"10.1109/DSNW.2010.5542621","DOIUrl":"https://doi.org/10.1109/DSNW.2010.5542621","url":null,"abstract":"The task of alert detection in event logs is very important in preventing or recovering from downtime events. The ability to do this automatically and accurately provides significant savings in the time and cost of downtime events. The Nodeinfo algorithm, which is currently in production use at Sandia National Laboratories, is an entropy based algorithm for alert detection in event logs. Automatic alert detection needs to be fast for it to be practical in a production environment. In this work we show that with Message Type Indexing (MTI) the computational effort required for alert detection can be reduced by up to 99%. This can be achieved without a drop in detection performance. Our proposed method has special significance because it provides a framework for alert detection which requires little or no human input, due to message type extraction required for MTI being carried out automatically using the Iterative Partitioning Log Mining (IPLoM) algorithm.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114972920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tânia Basso, Plinio Cesar Simoes Fernandes, M. Jino, Regina L. O. Moraes
{"title":"Analysis of the effect of Java software faults on security vulnerabilities and their detection by commercial web vulnerability scanner tool","authors":"Tânia Basso, Plinio Cesar Simoes Fernandes, M. Jino, Regina L. O. Moraes","doi":"10.1109/DSNW.2010.5542602","DOIUrl":"https://doi.org/10.1109/DSNW.2010.5542602","url":null,"abstract":"Most software systems developed nowadays are highly complex and subject to strict time constraints, and are often deployed with critical software faults. In many cases, software faults are responsible for security vulnerabilities which are exploited by hackers. Automatic web vulnerability scanners can help to locate these vulnerabilities. Trustworthiness of the results that these tools provide is important; hence, relevance of the results must be assessed. We analyze the effect on security vulnerabilities of Java software faults injected on source code of Web applications. We assess how these faults affect the behavior of the scanner vulnerability tool, to validate the results of its application. Software fault injection techniques and attack trees models were used to support the experiments. The injected software faults influenced the application behavior and, consequently, the behavior of the scanner tool. High percentage of uncovered vulnerabilities as well as false positives points out the limitations of the tool.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121715280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}