{"title":"用已知的 mahalanobis 距离模拟正态和非正态数据的多元离群值的方法","authors":"Oscar L. Olvera Astivia","doi":"10.1016/j.metip.2024.100157","DOIUrl":null,"url":null,"abstract":"<div><p>Monte Carlo simulations and theoretical analyses have repeatedly demonstrated the impact of outliers on statistical analysis. Most simulation studies generate outliers using one of two general approaches: by multiplying an arbitrary point by a constant or through a finite mixture. The latter can be extended to multivariate settings by defining the Mahalanobis distance between the centroids of two clusters of points. Nevertheless, when researchers aim to simulate individual data points with population-level Mahalanobis distances, the number of available procedures is very limited. This article generalizes one of the few existing methods to simulate an arbitrary number of outliers in an arbitrary number of dimensions, for both multivariate normal and non-normal data. A small simulation demonstration showcases how this methodology enables new simulation designs that were either unpopular or not possible due to the lack of a data-generating algorithm. A discussion of potential implications highlights the importance of considering multivariate outliers in simulation settings.</p></div>","PeriodicalId":93338,"journal":{"name":"Methods in Psychology (Online)","volume":"11 ","pages":"Article 100157"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590260124000237/pdfft?md5=994109449d478d74e642895eea71d9ad&pid=1-s2.0-S2590260124000237-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A method to simulate multivariate outliers with known mahalanobis distances for normal and non-normal data\",\"authors\":\"Oscar L. Olvera Astivia\",\"doi\":\"10.1016/j.metip.2024.100157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Monte Carlo simulations and theoretical analyses have repeatedly demonstrated the impact of outliers on statistical analysis. Most simulation studies generate outliers using one of two general approaches: by multiplying an arbitrary point by a constant or through a finite mixture. The latter can be extended to multivariate settings by defining the Mahalanobis distance between the centroids of two clusters of points. Nevertheless, when researchers aim to simulate individual data points with population-level Mahalanobis distances, the number of available procedures is very limited. This article generalizes one of the few existing methods to simulate an arbitrary number of outliers in an arbitrary number of dimensions, for both multivariate normal and non-normal data. A small simulation demonstration showcases how this methodology enables new simulation designs that were either unpopular or not possible due to the lack of a data-generating algorithm. A discussion of potential implications highlights the importance of considering multivariate outliers in simulation settings.</p></div>\",\"PeriodicalId\":93338,\"journal\":{\"name\":\"Methods in Psychology (Online)\",\"volume\":\"11 \",\"pages\":\"Article 100157\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2590260124000237/pdfft?md5=994109449d478d74e642895eea71d9ad&pid=1-s2.0-S2590260124000237-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods in Psychology (Online)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590260124000237\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Psychology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods in Psychology (Online)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590260124000237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Psychology","Score":null,"Total":0}
A method to simulate multivariate outliers with known mahalanobis distances for normal and non-normal data
Monte Carlo simulations and theoretical analyses have repeatedly demonstrated the impact of outliers on statistical analysis. Most simulation studies generate outliers using one of two general approaches: by multiplying an arbitrary point by a constant or through a finite mixture. The latter can be extended to multivariate settings by defining the Mahalanobis distance between the centroids of two clusters of points. Nevertheless, when researchers aim to simulate individual data points with population-level Mahalanobis distances, the number of available procedures is very limited. This article generalizes one of the few existing methods to simulate an arbitrary number of outliers in an arbitrary number of dimensions, for both multivariate normal and non-normal data. A small simulation demonstration showcases how this methodology enables new simulation designs that were either unpopular or not possible due to the lack of a data-generating algorithm. A discussion of potential implications highlights the importance of considering multivariate outliers in simulation settings.