Comparison between Statistical Approaches and Data Mining Algorithms for Outlier Detection

CAUCHY: Jurnal Matematika Murni dan Aplikasi Pub Date : 2024-05-16 DOI:10.18860/ca.v9i1.25450

Annisa Putri Utami, Anwar Fitrianto, K. Notodiputro

{"title":"Comparison between Statistical Approaches and Data Mining Algorithms for Outlier Detection","authors":"Annisa Putri Utami, Anwar Fitrianto, K. Notodiputro","doi":"10.18860/ca.v9i1.25450","DOIUrl":null,"url":null,"abstract":"Outliers are observation values that are very different from most observations. The presence of outliers in data can have a negative impact on research but can contain important information for other research. So, identifying outliers before conducting data analysis is a crucial thing to do. Outlier detection methods/techniques were first pioneered by researchers in statistics. However, due to rapid technological advances which have an impact on the ease of collecting extensive data, the development of outlier detection techniques is now handled mainly by researchers in the field of computer science (data mining) using computing facilities. This research aims to examine the results of simulation studies by comparing methods for identifying several outliers using statistical approaches and data mining algorithm approaches in various predetermined data scenarios. Based on the scenario carried out, the outlier detection method using a statistical approach is generally better than the outlier detection method using a data mining-based approach. Suggestions for further research are to improve the data mining method by focusing more on statistical analysis apart from focusing on data processing computing time so that the expected results of outlier detection are faster and more precise.","PeriodicalId":388519,"journal":{"name":"CAUCHY: Jurnal Matematika Murni dan Aplikasi","volume":"5 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAUCHY: Jurnal Matematika Murni dan Aplikasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18860/ca.v9i1.25450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Outliers are observation values that are very different from most observations. The presence of outliers in data can have a negative impact on research but can contain important information for other research. So, identifying outliers before conducting data analysis is a crucial thing to do. Outlier detection methods/techniques were first pioneered by researchers in statistics. However, due to rapid technological advances which have an impact on the ease of collecting extensive data, the development of outlier detection techniques is now handled mainly by researchers in the field of computer science (data mining) using computing facilities. This research aims to examine the results of simulation studies by comparing methods for identifying several outliers using statistical approaches and data mining algorithm approaches in various predetermined data scenarios. Based on the scenario carried out, the outlier detection method using a statistical approach is generally better than the outlier detection method using a data mining-based approach. Suggestions for further research are to improve the data mining method by focusing more on statistical analysis apart from focusing on data processing computing time so that the expected results of outlier detection are faster and more precise.

查看原文本刊更多论文

用于离群点检测的统计方法与数据挖掘算法的比较

离群值是指与大多数观测值截然不同的观测值。数据中出现异常值会对研究产生负面影响，但也可能包含其他研究的重要信息。因此，在进行数据分析之前识别离群值是一件至关重要的事情。离群值检测方法/技术最早由统计学研究人员首创。然而，由于技术的飞速发展影响了广泛数据收集的便利性，离群值检测技术的开发现在主要由计算机科学领域（数据挖掘）的研究人员利用计算机设施来处理。本研究旨在通过比较在各种预定数据场景中使用统计方法和数据挖掘算法方法识别若干异常值的方法，检查模拟研究的结果。根据所进行的情景，使用统计方法的离群值检测方法总体上优于使用基于数据挖掘方法的离群值检测方法。对进一步研究的建议是改进数据挖掘方法，除了关注数据处理计算时间外，更多关注统计分析，从而使离群点检测的预期结果更快、更精确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CAUCHY: Jurnal Matematika Murni dan Aplikasi

自引率

0.00%

发文量