Algorithms of statistical anomalies clearing for data science applications

Oleksii Pysarchuk, Danylo Baran, Yuri Mironov, Illya Pysarchuk
{"title":"Algorithms of statistical anomalies clearing for data science applications","authors":"Oleksii Pysarchuk, Danylo Baran, Yuri Mironov, Illya Pysarchuk","doi":"10.20535/srit.2308-8893.2023.1.06","DOIUrl":null,"url":null,"abstract":"The paper considers the nature of input data used by Data Science algorithms of modern-day application domains. It then proposes three algorithms designed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative simplicity and a small number of configurable parameters. Parameters are determined by machine learning with respect to the properties of input data. These algorithms are flexible and have no strict dependency on the nature and origin of data. The efficiency of the proposed approaches is verified with a modeling experiment conducted using algorithms implemented in Python. The results are illustrated with plots built using raw and processed datasets. The algorithms application is analyzed, and results are compared.","PeriodicalId":330635,"journal":{"name":"System research and information technologies","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"System research and information technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20535/srit.2308-8893.2023.1.06","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The paper considers the nature of input data used by Data Science algorithms of modern-day application domains. It then proposes three algorithms designed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative simplicity and a small number of configurable parameters. Parameters are determined by machine learning with respect to the properties of input data. These algorithms are flexible and have no strict dependency on the nature and origin of data. The efficiency of the proposed approaches is verified with a modeling experiment conducted using algorithms implemented in Python. The results are illustrated with plots built using raw and processed datasets. The algorithms application is analyzed, and results are compared.
数据科学应用中的统计异常清除算法
本文考虑了现代应用领域中数据科学算法所使用的输入数据的性质。然后,它提出了三种算法,旨在从数据集中去除统计异常,作为数据科学管道的一部分。给定算法的主要优点是它们相对简单和少量的可配置参数。参数由机器学习根据输入数据的属性确定。这些算法是灵活的,没有严格依赖于数据的性质和来源。通过使用Python实现的算法进行建模实验,验证了所提出方法的有效性。结果用使用原始和处理过的数据集建立的图来说明。分析了算法的应用,并对结果进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信