检查实验数据样本是否存在异常值:方法比较

T. V. Potanina, I. V. Mykhaylenko
{"title":"检查实验数据样本是否存在异常值:方法比较","authors":"T. V. Potanina, I. V. Mykhaylenko","doi":"10.20998/2078-5364.2023.3.07","DOIUrl":null,"url":null,"abstract":"The task of detecting outliers (misses, abnormalous values, results that stand out sharply, results that have come off) is one of the most relevant, complex and ambiguous in the experimental materialprocessing. Such values are the experiment results, which are abnormally far from other points from a series of parallel observations. \nThe source of emissions is often measurement errors. Among these are incorrect recording of the experiment results, possible incorrect coding of data, incorrect conduct of the experiment, etc. Gross errors occur in the event of a sudden change in the conditions of conducting the research, malfunctions in the operation of the equipment, etc. \nAt the same time, outliers may indicate an unexpected, extraordinary behavior of the measured value – a yet-to-be-explained property process manifestation. And that's why an analysis using reliable mathematical tools is needed. \nThe methods of detecting emissions are diverse and numerous. Parametric tests are more sensitive to the sample size and to the population values probability distribution. Non-parametric tests are more flexible and can be applied if the non-normal distributon of the sample or the sample size is small; such criteria give a better result in asymmetric distributions, because they use the median instead of the mean; they can be applied to ordinal or nominal data, as well as in the situation of an aberrant outlier value. \nInterval analysis methods, in particular interval statistics, are an alternative flexible toolkit for obtaining a more accurate and complete analysis of experimental data in the incomplete information, noise presence, measurement outliers, and the presence of abnormalous and aberrant points. \nA comparison of the results of the application of parametric criteria (-criterion, -criterion, Lvovskyi) and non-parametric criteria (the box-and-whiskers-plot) for detecting emissions, as well as calculation using interval statistics methods, was carried out. One of the outliers was determined by the non-parametric criterion, the -criterion and the procedure for detecting a single outlier using interval methods. Two values are suspicious outliers using the box-whisker rule and the interval statistics recognition algorithm. \nThe methods of detecting outliers using interval analysis methods are no less effective than the use of non-parametric tests.","PeriodicalId":334981,"journal":{"name":"Integrated Technologies and Energy Saving","volume":"2 11","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EXAMINATION OF EXPERIMENTAL DATA SAMPLES FOR THE PRESENCE OF OUTLIERS: COMPARISON OF METHODS\",\"authors\":\"T. V. Potanina, I. V. Mykhaylenko\",\"doi\":\"10.20998/2078-5364.2023.3.07\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The task of detecting outliers (misses, abnormalous values, results that stand out sharply, results that have come off) is one of the most relevant, complex and ambiguous in the experimental materialprocessing. Such values are the experiment results, which are abnormally far from other points from a series of parallel observations. \\nThe source of emissions is often measurement errors. Among these are incorrect recording of the experiment results, possible incorrect coding of data, incorrect conduct of the experiment, etc. Gross errors occur in the event of a sudden change in the conditions of conducting the research, malfunctions in the operation of the equipment, etc. \\nAt the same time, outliers may indicate an unexpected, extraordinary behavior of the measured value – a yet-to-be-explained property process manifestation. And that's why an analysis using reliable mathematical tools is needed. \\nThe methods of detecting emissions are diverse and numerous. Parametric tests are more sensitive to the sample size and to the population values probability distribution. Non-parametric tests are more flexible and can be applied if the non-normal distributon of the sample or the sample size is small; such criteria give a better result in asymmetric distributions, because they use the median instead of the mean; they can be applied to ordinal or nominal data, as well as in the situation of an aberrant outlier value. \\nInterval analysis methods, in particular interval statistics, are an alternative flexible toolkit for obtaining a more accurate and complete analysis of experimental data in the incomplete information, noise presence, measurement outliers, and the presence of abnormalous and aberrant points. \\nA comparison of the results of the application of parametric criteria (-criterion, -criterion, Lvovskyi) and non-parametric criteria (the box-and-whiskers-plot) for detecting emissions, as well as calculation using interval statistics methods, was carried out. One of the outliers was determined by the non-parametric criterion, the -criterion and the procedure for detecting a single outlier using interval methods. Two values are suspicious outliers using the box-whisker rule and the interval statistics recognition algorithm. \\nThe methods of detecting outliers using interval analysis methods are no less effective than the use of non-parametric tests.\",\"PeriodicalId\":334981,\"journal\":{\"name\":\"Integrated Technologies and Energy Saving\",\"volume\":\"2 11\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Integrated Technologies and Energy Saving\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20998/2078-5364.2023.3.07\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integrated Technologies and Energy Saving","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20998/2078-5364.2023.3.07","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

检测异常值(缺失值、异常值、突出结果、偏离结果)是实验材料处理中最相关、最复杂和最模糊的任务之一。这些值就是实验结果,与一系列平行观测的其他点异常地相差甚远。排放源往往是测量误差。其中包括实验结果的错误记录,可能的数据编码错误,实验的错误进行等。在进行研究的条件突然发生变化、设备运行故障等情况下,会发生重大错误。同时,异常值可能表明测量值的意外的、不寻常的行为——一种尚未解释的属性过程表现。这就是为什么需要使用可靠的数学工具进行分析的原因。探测辐射的方法是多种多样的。参数检验对样本量和总体值的概率分布更为敏感。非参数检验比较灵活,在样本非正态分布或样本量较小的情况下可以应用;这样的标准在非对称分布中给出了更好的结果,因为它们使用中位数而不是平均值;它们可以应用于有序或名义数据,以及在异常异常值的情况下。区间分析方法,特别是区间统计方法,是在信息不完整、存在噪声、测量异常值、存在异常和异常点的情况下,对实验数据进行更准确和完整分析的一种灵活的替代工具。比较了应用参数准则(-criterion, -criterion, Lvovskyi)和非参数准则(盒须图)检测辐射的结果,以及使用区间统计方法进行计算。其中一个异常值由非参数准则、-准则和使用区间方法检测单个异常值的程序确定。使用盒须规则和区间统计识别算法,两个值是可疑的异常值。使用区间分析方法检测异常值的方法并不比使用非参数检验的方法效果差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
EXAMINATION OF EXPERIMENTAL DATA SAMPLES FOR THE PRESENCE OF OUTLIERS: COMPARISON OF METHODS
The task of detecting outliers (misses, abnormalous values, results that stand out sharply, results that have come off) is one of the most relevant, complex and ambiguous in the experimental materialprocessing. Such values are the experiment results, which are abnormally far from other points from a series of parallel observations. The source of emissions is often measurement errors. Among these are incorrect recording of the experiment results, possible incorrect coding of data, incorrect conduct of the experiment, etc. Gross errors occur in the event of a sudden change in the conditions of conducting the research, malfunctions in the operation of the equipment, etc. At the same time, outliers may indicate an unexpected, extraordinary behavior of the measured value – a yet-to-be-explained property process manifestation. And that's why an analysis using reliable mathematical tools is needed. The methods of detecting emissions are diverse and numerous. Parametric tests are more sensitive to the sample size and to the population values probability distribution. Non-parametric tests are more flexible and can be applied if the non-normal distributon of the sample or the sample size is small; such criteria give a better result in asymmetric distributions, because they use the median instead of the mean; they can be applied to ordinal or nominal data, as well as in the situation of an aberrant outlier value. Interval analysis methods, in particular interval statistics, are an alternative flexible toolkit for obtaining a more accurate and complete analysis of experimental data in the incomplete information, noise presence, measurement outliers, and the presence of abnormalous and aberrant points. A comparison of the results of the application of parametric criteria (-criterion, -criterion, Lvovskyi) and non-parametric criteria (the box-and-whiskers-plot) for detecting emissions, as well as calculation using interval statistics methods, was carried out. One of the outliers was determined by the non-parametric criterion, the -criterion and the procedure for detecting a single outlier using interval methods. Two values are suspicious outliers using the box-whisker rule and the interval statistics recognition algorithm. The methods of detecting outliers using interval analysis methods are no less effective than the use of non-parametric tests.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信