大型系统中的机器学习异常检测

2016 IEEE AUTOTESTCON Pub Date : 2016-09-01 DOI:10.1109/AUTEST.2016.7589589

J. Murphree

{"title":"大型系统中的机器学习异常检测","authors":"J. Murphree","doi":"10.1109/AUTEST.2016.7589589","DOIUrl":null,"url":null,"abstract":"We have a need for methods to efficiently determine the health of a system. Diagnostics and prognostics determine system heath through analysis of data from sensors. Anomalies in the data can help us determine if there is a failure or a pending failure. There are common statistical methods to detect anomalies in individual measurements. For systems with many measurements, the anomalies may occur as specific combinations of values. Large systems have various associated states and modes which define the valid measurements. The amount of data to analyze grows very quickly as the system becomes more complex. In recent years techniques have been developed to address large data analysis. Machine Learning encompasses a broad selection of tools to optimize a statistical model of the data. These tools include supervised learning techniques, such as linear regression and logistic regression, in which training data exists to tune the model. Unsupervised learning, such as clustering, is used to explore data which does not have a defined output label associated with inputs data. Standard approaches to training supervised learning systems require a large sample of positive and negative outcome data. Some uses of machine learning involve data where there are very few cases of negative outcomes. There are machine learning algorithms defined as Anomaly Detection which are designed to deal with this type of data. Simple algorithms include Gaussian Distribution Analysis, which assumes independence in distributions of data. Large Systems with anomalies defined in the dependent combinations of data require either a manual creation of combinations of independent variables, or Multivariate Gaussian Distribution Analysis, which does not scale well for large systems. A further complication is the mixture of linear and discrete data. Neural Networks are a type of learning system which has been applied to each of the individual needs addressed above. This paper describes an approach to anomaly detection using neural networks for the specific problems in large systems to efficiently determine system health.","PeriodicalId":314357,"journal":{"name":"2016 IEEE AUTOTESTCON","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"Machine learning anomaly detection in large systems\",\"authors\":\"J. Murphree\",\"doi\":\"10.1109/AUTEST.2016.7589589\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We have a need for methods to efficiently determine the health of a system. Diagnostics and prognostics determine system heath through analysis of data from sensors. Anomalies in the data can help us determine if there is a failure or a pending failure. There are common statistical methods to detect anomalies in individual measurements. For systems with many measurements, the anomalies may occur as specific combinations of values. Large systems have various associated states and modes which define the valid measurements. The amount of data to analyze grows very quickly as the system becomes more complex. In recent years techniques have been developed to address large data analysis. Machine Learning encompasses a broad selection of tools to optimize a statistical model of the data. These tools include supervised learning techniques, such as linear regression and logistic regression, in which training data exists to tune the model. Unsupervised learning, such as clustering, is used to explore data which does not have a defined output label associated with inputs data. Standard approaches to training supervised learning systems require a large sample of positive and negative outcome data. Some uses of machine learning involve data where there are very few cases of negative outcomes. There are machine learning algorithms defined as Anomaly Detection which are designed to deal with this type of data. Simple algorithms include Gaussian Distribution Analysis, which assumes independence in distributions of data. Large Systems with anomalies defined in the dependent combinations of data require either a manual creation of combinations of independent variables, or Multivariate Gaussian Distribution Analysis, which does not scale well for large systems. A further complication is the mixture of linear and discrete data. Neural Networks are a type of learning system which has been applied to each of the individual needs addressed above. This paper describes an approach to anomaly detection using neural networks for the specific problems in large systems to efficiently determine system health.\",\"PeriodicalId\":314357,\"journal\":{\"name\":\"2016 IEEE AUTOTESTCON\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE AUTOTESTCON\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AUTEST.2016.7589589\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE AUTOTESTCON","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AUTEST.2016.7589589","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

摘要

我们需要有效地确定系统健康状况的方法。诊断和预测通过分析来自传感器的数据来确定系统的健康状况。数据中的异常可以帮助我们确定是否存在故障或即将发生的故障。有常见的统计方法来检测个别测量中的异常。对于具有许多测量值的系统，异常可能以特定值的组合形式出现。大型系统具有各种相关的状态和模态，这些状态和模态定义了有效的测量。随着系统变得越来越复杂，需要分析的数据量也会迅速增长。近年来，已经开发了处理大数据分析的技术。机器学习包含了广泛的工具选择来优化数据的统计模型。这些工具包括监督学习技术，例如线性回归和逻辑回归，其中存在训练数据以调整模型。无监督学习，如聚类，用于探索没有与输入数据关联的定义输出标签的数据。训练监督学习系统的标准方法需要大量的正面和负面结果数据样本。机器学习的一些应用涉及很少有负面结果的数据。有一些机器学习算法被定义为异常检测，专门用于处理这类数据。简单的算法包括高斯分布分析，它假设数据的分布是独立的。在数据的依赖组合中定义异常的大型系统需要手动创建独立变量的组合，或者进行多元高斯分布分析，这对于大型系统来说不是很好。更复杂的是线性和离散数据的混合。神经网络是一种学习系统，它已经应用于上面提到的每个个体需求。本文介绍了一种利用神经网络对大型系统中的特定问题进行异常检测的方法，以有效地确定系统的健康状况。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine learning anomaly detection in large systems

We have a need for methods to efficiently determine the health of a system. Diagnostics and prognostics determine system heath through analysis of data from sensors. Anomalies in the data can help us determine if there is a failure or a pending failure. There are common statistical methods to detect anomalies in individual measurements. For systems with many measurements, the anomalies may occur as specific combinations of values. Large systems have various associated states and modes which define the valid measurements. The amount of data to analyze grows very quickly as the system becomes more complex. In recent years techniques have been developed to address large data analysis. Machine Learning encompasses a broad selection of tools to optimize a statistical model of the data. These tools include supervised learning techniques, such as linear regression and logistic regression, in which training data exists to tune the model. Unsupervised learning, such as clustering, is used to explore data which does not have a defined output label associated with inputs data. Standard approaches to training supervised learning systems require a large sample of positive and negative outcome data. Some uses of machine learning involve data where there are very few cases of negative outcomes. There are machine learning algorithms defined as Anomaly Detection which are designed to deal with this type of data. Simple algorithms include Gaussian Distribution Analysis, which assumes independence in distributions of data. Large Systems with anomalies defined in the dependent combinations of data require either a manual creation of combinations of independent variables, or Multivariate Gaussian Distribution Analysis, which does not scale well for large systems. A further complication is the mixture of linear and discrete data. Neural Networks are a type of learning system which has been applied to each of the individual needs addressed above. This paper describes an approach to anomaly detection using neural networks for the specific problems in large systems to efficiently determine system health.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE AUTOTESTCON

自引率

0.00%

发文量