An extended w-test for outlier diagnostics in linear models

IF 3.9 2区 地球科学 Q1 GEOCHEMISTRY & GEOPHYSICS
Yangkang Yu, Ling Yang, Yunzhong Shen
{"title":"An extended w-test for outlier diagnostics in linear models","authors":"Yangkang Yu, Ling Yang, Yunzhong Shen","doi":"10.1007/s00190-024-01855-0","DOIUrl":null,"url":null,"abstract":"<p>The issue of outliers has been a research focus in the field of geodesy. Based on a statistical testing method known as the <i>w</i>-test, data snooping along with its iterative form, iterative data snooping (IDS), is commonly used to diagnose outliers in linear models. However, in the case of multiple outliers, it may suffer from the masking and swamping effects, thereby limiting the detection and identification capabilities. This contribution is to investigate the cause of masking and swamping effects and propose a new method to mitigate these phenomena. First, based on the data division, an extended form of the <i>w</i>-test with its reliability measure is presented, and a theoretical reinterpretation of data snooping and IDS is provided. Then, to alleviate the effects of masking and swamping, a new outlier diagnostic method and its iterative form are proposed, namely data refining and iterative data refining (IDR). In general, if the total observations are initially divided into an inlying set and an outlying set, data snooping can be considered a process of selecting outliers from the inlying set to the outlying set. Conversely, data refining is then a reverse process to transfer inliers from the outlying set to the inlying one. Both theoretical analysis and practical examples show that IDR would keep stronger robustness than IDS due to the alleviation of masking and swamping effect, although it may pose a higher risk of precision loss when dealing with insufficient data.</p>","PeriodicalId":54822,"journal":{"name":"Journal of Geodesy","volume":"13 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Geodesy","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s00190-024-01855-0","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

The issue of outliers has been a research focus in the field of geodesy. Based on a statistical testing method known as the w-test, data snooping along with its iterative form, iterative data snooping (IDS), is commonly used to diagnose outliers in linear models. However, in the case of multiple outliers, it may suffer from the masking and swamping effects, thereby limiting the detection and identification capabilities. This contribution is to investigate the cause of masking and swamping effects and propose a new method to mitigate these phenomena. First, based on the data division, an extended form of the w-test with its reliability measure is presented, and a theoretical reinterpretation of data snooping and IDS is provided. Then, to alleviate the effects of masking and swamping, a new outlier diagnostic method and its iterative form are proposed, namely data refining and iterative data refining (IDR). In general, if the total observations are initially divided into an inlying set and an outlying set, data snooping can be considered a process of selecting outliers from the inlying set to the outlying set. Conversely, data refining is then a reverse process to transfer inliers from the outlying set to the inlying one. Both theoretical analysis and practical examples show that IDR would keep stronger robustness than IDS due to the alleviation of masking and swamping effect, although it may pose a higher risk of precision loss when dealing with insufficient data.

Abstract Image

线性模型离群值诊断的扩展 w 检验
异常值问题一直是大地测量领域的研究重点。基于一种称为 w 检验的统计检验方法,数据窥探及其迭代形式--迭代数据窥探(IDS)--通常用于诊断线性模型中的异常值。然而,在多个异常值的情况下,它可能会受到掩蔽和沼泽效应的影响,从而限制了检测和识别能力。本文旨在研究掩蔽效应和沼泽效应的原因,并提出一种新方法来缓解这些现象。首先,在数据划分的基础上,提出了 W 检验的扩展形式及其可靠性度量,并从理论上重新解释了数据窥探和 IDS。然后,为了减轻掩蔽和沼泽的影响,提出了一种新的离群值诊断方法及其迭代形式,即数据精炼和迭代数据精炼(IDR)。一般来说,如果最初将全部观测数据分为内含集和离群集,那么数据窥探可以被视为从内含集向离群集选择离群值的过程。反之,数据提炼则是一个将异常值从离群集转移到正常集的反向过程。理论分析和实际案例都表明,IDR 比 IDS 具有更强的鲁棒性,因为它减轻了掩蔽和沼泽效应,不过在处理数据不足时,它可能会带来更高的精度损失风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Geodesy
Journal of Geodesy 地学-地球化学与地球物理
CiteScore
8.60
自引率
9.10%
发文量
85
审稿时长
9 months
期刊介绍: The Journal of Geodesy is an international journal concerned with the study of scientific problems of geodesy and related interdisciplinary sciences. Peer-reviewed papers are published on theoretical or modeling studies, and on results of experiments and interpretations. Besides original research papers, the journal includes commissioned review papers on topical subjects and special issues arising from chosen scientific symposia or workshops. The journal covers the whole range of geodetic science and reports on theoretical and applied studies in research areas such as: -Positioning -Reference frame -Geodetic networks -Modeling and quality control -Space geodesy -Remote sensing -Gravity fields -Geodynamics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信