A Method for Modeling Data Anomalies in Practice

2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) Pub Date : 2021-09-01 DOI:10.1109/SEAA53835.2021.00024

Jennifer Horkoff, M. Staron, Wilhelm Meding

{"title":"A Method for Modeling Data Anomalies in Practice","authors":"Jennifer Horkoff, M. Staron, Wilhelm Meding","doi":"10.1109/SEAA53835.2021.00024","DOIUrl":null,"url":null,"abstract":"As technology has allowed us to collect large amounts of industrial data, it has become critical to analyze and understand the data collected, in particular to find data anomalies. Anomaly analysis allows a company to detect, analyze and understand anomalous or unusual data patterns. This is an important activity to understand, for example, deviations in service which may indicate potential problems, or differing customer behavior which may reveal new business opportunities. Much previous work has focused on anomaly detection, in particular using machine learning. Such approaches allow clustering of data patterns by common attributes, and, although useful, clusters often do not correspond to the root causes of anomalies, meaning that more manual analysis is needed. In this paper we report on a design science study with two different teams, in a partner company which focuses on modeling and understanding the attributes and root causes of data anomalies. After iteration, for each team, we have created general and anomaly-specific UML class diagrams and goal models to capture anomaly details. We use our experiences to create an example taxonomy, classifying anomalies by their root causes, and to create a general method for modeling and understanding data anomalies. This work paves the way for a better understanding of anomalies and their root causes, leading towards creating a training set which may be used for machine learning approaches.","PeriodicalId":435977,"journal":{"name":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"264 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEAA53835.2021.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

As technology has allowed us to collect large amounts of industrial data, it has become critical to analyze and understand the data collected, in particular to find data anomalies. Anomaly analysis allows a company to detect, analyze and understand anomalous or unusual data patterns. This is an important activity to understand, for example, deviations in service which may indicate potential problems, or differing customer behavior which may reveal new business opportunities. Much previous work has focused on anomaly detection, in particular using machine learning. Such approaches allow clustering of data patterns by common attributes, and, although useful, clusters often do not correspond to the root causes of anomalies, meaning that more manual analysis is needed. In this paper we report on a design science study with two different teams, in a partner company which focuses on modeling and understanding the attributes and root causes of data anomalies. After iteration, for each team, we have created general and anomaly-specific UML class diagrams and goal models to capture anomaly details. We use our experiences to create an example taxonomy, classifying anomalies by their root causes, and to create a general method for modeling and understanding data anomalies. This work paves the way for a better understanding of anomalies and their root causes, leading towards creating a training set which may be used for machine learning approaches.

查看原文本刊更多论文

一种数据异常建模方法

随着技术的发展，我们可以收集大量的工业数据，分析和理解收集到的数据变得至关重要，尤其是发现数据异常。异常分析允许公司检测、分析和理解异常或不寻常的数据模式。这是一个重要的活动，以了解，例如，服务偏差可能表明潜在的问题，或不同的客户行为，可能揭示新的商业机会。以前的许多工作都集中在异常检测上，特别是使用机器学习。这种方法允许按公共属性对数据模式进行聚类，尽管有用，但聚类通常不对应异常的根本原因，这意味着需要更多的手工分析。在本文中，我们报告了一个设计科学研究与两个不同的团队，在一个合作伙伴公司的重点建模和理解的属性和数据异常的根本原因。在迭代之后，对于每个团队，我们已经创建了通用的和特定于异常的UML类图和目标模型来获取异常细节。我们使用我们的经验来创建一个示例分类法，根据其根本原因对异常进行分类，并创建一个用于建模和理解数据异常的通用方法。这项工作为更好地理解异常及其根本原因铺平了道路，从而创建了一个可用于机器学习方法的训练集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

自引率

0.00%

发文量