Comparison between Statistical Approaches and Data Mining Algorithms for Outlier Detection

Annisa Putri Utami, Anwar Fitrianto, K. Notodiputro
{"title":"Comparison between Statistical Approaches and Data Mining Algorithms for Outlier Detection","authors":"Annisa Putri Utami, Anwar Fitrianto, K. Notodiputro","doi":"10.18860/ca.v9i1.25450","DOIUrl":null,"url":null,"abstract":"Outliers are observation values that are very different from most observations. The presence of outliers in data can have a negative impact on research but can contain important information for other research. So, identifying outliers before conducting data analysis is a crucial thing to do. Outlier detection methods/techniques were first pioneered by researchers in statistics. However, due to rapid technological advances which have an impact on the ease of collecting extensive data, the development of outlier detection techniques is now handled mainly by researchers in the field of computer science (data mining) using computing facilities. This research aims to examine the results of simulation studies by comparing methods for identifying several outliers using statistical approaches and data mining algorithm approaches in various predetermined data scenarios. Based on the scenario carried out, the outlier detection method using a statistical approach is generally better than the outlier detection method using a data mining-based approach. Suggestions for further research are to improve the data mining method by focusing more on statistical analysis apart from focusing on data processing computing time so that the expected results of outlier detection are faster and more precise.","PeriodicalId":388519,"journal":{"name":"CAUCHY: Jurnal Matematika Murni dan Aplikasi","volume":"5 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAUCHY: Jurnal Matematika Murni dan Aplikasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18860/ca.v9i1.25450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Outliers are observation values that are very different from most observations. The presence of outliers in data can have a negative impact on research but can contain important information for other research. So, identifying outliers before conducting data analysis is a crucial thing to do. Outlier detection methods/techniques were first pioneered by researchers in statistics. However, due to rapid technological advances which have an impact on the ease of collecting extensive data, the development of outlier detection techniques is now handled mainly by researchers in the field of computer science (data mining) using computing facilities. This research aims to examine the results of simulation studies by comparing methods for identifying several outliers using statistical approaches and data mining algorithm approaches in various predetermined data scenarios. Based on the scenario carried out, the outlier detection method using a statistical approach is generally better than the outlier detection method using a data mining-based approach. Suggestions for further research are to improve the data mining method by focusing more on statistical analysis apart from focusing on data processing computing time so that the expected results of outlier detection are faster and more precise.
用于离群点检测的统计方法与数据挖掘算法的比较
离群值是指与大多数观测值截然不同的观测值。数据中出现异常值会对研究产生负面影响,但也可能包含其他研究的重要信息。因此,在进行数据分析之前识别离群值是一件至关重要的事情。离群值检测方法/技术最早由统计学研究人员首创。然而,由于技术的飞速发展影响了广泛数据收集的便利性,离群值检测技术的开发现在主要由计算机科学领域(数据挖掘)的研究人员利用计算机设施来处理。本研究旨在通过比较在各种预定数据场景中使用统计方法和数据挖掘算法方法识别若干异常值的方法,检查模拟研究的结果。根据所进行的情景,使用统计方法的离群值检测方法总体上优于使用基于数据挖掘方法的离群值检测方法。对进一步研究的建议是改进数据挖掘方法,除了关注数据处理计算时间外,更多关注统计分析,从而使离群点检测的预期结果更快、更精确。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信