Proceedings of the First Workshop on Machine Learning for Computing Systems最新文献

RACC ]

Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-06-12 DOI: 10.1145/3217871.3217876

Saurav Nanda, T. Hacker

{"title":"RACC","authors":"Saurav Nanda, T. Hacker","doi":"10.1145/3217871.3217876","DOIUrl":"https://doi.org/10.1145/3217871.3217876","url":null,"abstract":"Resource optimization has always been a big challenge in modern data centers. The process of performing workload consolidation on a minimal number of physical machines is becoming more complex when these data centers began supporting containers in addition to virtual machines (VMs). With the increasing usage of containers with VMs in data centers, it becomes critical to address this problem from the container's point of view - that is to optimally allocate containers in the fewest number of physical hosts. Depending on the type of application workload or tasks, infrastructure providers may provision separate containers to handle each task. These tasks may have different resource demands, such as: some of these tasks are CPU intensive, some memory intensive, some I/O intensive and some may be network intensive. Also, the physical machines in the data center are heterogeneous i.e. the hardware configuration (resource capacity) of these machines may differ from each other. Hence, the challenge is to consolidate all the active containers with different resource requirements on the minimum number of physical machines that are not even. We formulate a multi-resource bin packing problem and propose a Deep Learning technique called Fit-for-Packing to place a near-optimal number of containers on a physical machine. Experimental results show that our model achieves an average training accuracy of 82.01% and an average testing accuracy of 82.93%.","PeriodicalId":174025,"journal":{"name":"Proceedings of the First Workshop on Machine Learning for Computing Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115591423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Diagnosing NFS errors: Preliminary Findings from a Syslog Analysis of Bridges 诊断NFS错误:从网桥的Syslog分析的初步发现

Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-06-12 DOI: 10.1145/3217871.3217873

P. Choudhary, R. Sooriamurthi, J. R. Scott, Ed Hanna, J. Sommerfield, A. Kar

引用次数: 0

Causal Relationships amongst Sensors in the Trinity Supercomputer: work in progress 三位一体超级计算机中传感器之间的因果关系:工作进展中

Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-06-12 DOI: 10.1145/3217871.3217875

Ian Goetting, Elisabeth Baseman, H. Cao

{"title":"Causal Relationships amongst Sensors in the Trinity Supercomputer: work in progress","authors":"Ian Goetting, Elisabeth Baseman, H. Cao","doi":"10.1145/3217871.3217875","DOIUrl":"https://doi.org/10.1145/3217871.3217875","url":null,"abstract":"HPC systems are inherently complex, both to work with and to maintain. Trying to anticipate a sudden event, such as component failure or how the system will react to a newly installed module, is too large and convoluted of a problem for a single person or group of people to solve manually. In this paper, we attempt to explore the causal relationships present amongst sensors and monitoring data found in these kinds of machines. The intent of this study is to both better understand how different components and modules of the machines interact with each other, as well as get a better understanding of how a change in one part of the machine effects another part. To achieve this, we apply both a Bayesian network and logistic regression, in conjunction with a causal graph generator (TETRAD), on sensor data generated from the Trinity supercomputer in Los Alamos, NM. In particular, the data that was examined in this study focused on data from 4 slot-level sensors and 3 row-level sensors. It was found that, while these sensors do contain causal structure by themselves, they do not seem to makeup the entire causal structure, only a portion of it. The presence of latent variables, as well as possibly more interconnections (i.e. causal relationships) between each of the sensors, are likely having an effect on the predictive accuracy of the Bayesian network and logistic regression experiments conducted in this study. Therefore, it is recommended, for future work, that more experiments are conducted involving more sensors and possibly other relevant data.","PeriodicalId":174025,"journal":{"name":"Proceedings of the First Workshop on Machine Learning for Computing Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121502860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Work in Progress: Topic Modeling for HPC Job State Prediction 工作进展:HPC作业状态预测的主题建模

Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-06-12 DOI: 10.1145/3217871.3217874

Alexandra DeLucia, Elisabeth Baseman

引用次数: 3

Proceedings of the First Workshop on Machine Learning for Computing Systems 第一届计算机系统机器学习研讨会论文集

Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-06-12 DOI: 10.1145/3217871

引用次数: 0

Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection 可解释系统日志异常检测的递归神经网络注意机制

Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-03-13 DOI: 10.1145/3217871.3217872

Andy Brown, Aaron Tuor, Brian Hutchinson, Nicole Nichols

{"title":"Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection","authors":"Andy Brown, Aaron Tuor, Brian Hutchinson, Nicole Nichols","doi":"10.1145/3217871.3217872","DOIUrl":"https://doi.org/10.1145/3217871.3217872","url":null,"abstract":"Deep learning has recently demonstrated state-of-the art performance on key tasks related to the maintenance of computer systems, such as intrusion detection, denial of service attack detection, hardware and software system failures, and malware detection. In these contexts, model interpretability is vital for administrator and analyst to trust and act on the automated analysis of machine learning models. Deep learning methods have been criticized as black box oracles which allow limited insight into decision factors. In this work we seek to bridge the gap between the impressive performance of deep learning models and the need for interpretable model introspection. To this end we present recurrent neural network (RNN) language models augmented with attention for anomaly detection in system logs. Our methods are generally applicable to any computer system and logging source. By incorporating attention variants into our RNN language models we create opportunities for model introspection and analysis without sacrificing state-of-the art performance. We demonstrate model performance and illustrate model interpretability on an intrusion detection task using the Los Alamos National Laboratory (LANL) cyber security dataset, reporting upward of 0.99 area under the receiver operator characteristic curve despite being trained only on a single day's worth of data.","PeriodicalId":174025,"journal":{"name":"Proceedings of the First Workshop on Machine Learning for Computing Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127819799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 151