Multiple Upper Outlier Detection Procedure in Generalized Exponential Sample

European Journal of Statistics Pub Date : 2021-10-01 DOI:10.28924/ada/stat.1.58

A. Singh, Abhinav Singh, Rohit Patawa

{"title":"Multiple Upper Outlier Detection Procedure in Generalized Exponential Sample","authors":"A. Singh, Abhinav Singh, Rohit Patawa","doi":"10.28924/ada/stat.1.58","DOIUrl":null,"url":null,"abstract":"Hawkins [6] defined an outlier as an observation that is significantly different from the remaining observations in a dataset so as to arouse suspicion that it was generated by different mechanism. Barnett and Lewis [2] defined an outlier as an observation that deviates significantly in the sample in which it occurs. Spatial outliers are different from outliers and many authors like Singh and Lalitha [9]. Outlier detection procedures for two parameter gamma distribution have been discussed by many authors. But one major disadvantage of the gamma distribution is that the distribution (or survival) function cannot be expressed in a closed form if the shape parameter is not an integer. Since it is in terms of an incomplete gamma function, one needs to obtain the distribution/survival function or the failure rate by numerical integration. This is a limitation in the usage of gamma distribution. It is observed that the generalized exponential distribution can be used as an alternative to the gamma distribution in many situations. Different properties like monotonicity of the hazard functions and tail behaviours of the gamma distribution and that of the generalized exponential distribution are quite similar in nature. But the latter one has a nice compact distribution (or survival) function. It is observed that for a given gamma distribution there exists a generalized exponential distribution so that the two distribution functions are almost identical. Since the gamma distribution function does not have a compact form, efficiently generating gamma random numbers is known to be problematic. It was observed that for all practical purposes it is possible to generate approximate gamma random numbers using generalized exponential distribution and the random samples thus obtained cannot be differentiated using any statistical tests. Many authors proposed a location and scale invariant test based on the test statistic Zk for testing the upper outliers in two-parameter exponential sample. Kumar et. al. [7] and Singh and Lalitha [10] have proposed test statistics for testing multiple upper outlier detection in gamma sample. Various test statistics have been proposed to detect outliers in an exponential sample. Likes [8] also proposed a new test statistics to detect outlier in the exponential case. In this paper, the test statistic proposed by Likes has been used to detect outliers in a generalized exponential sample and the critical value of the test statistics has been obtained. A simulation study is carried out to compare the theoretical developments.","PeriodicalId":153849,"journal":{"name":"European Journal of Statistics","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.28924/ada/stat.1.58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Hawkins [6] defined an outlier as an observation that is significantly different from the remaining observations in a dataset so as to arouse suspicion that it was generated by different mechanism. Barnett and Lewis [2] defined an outlier as an observation that deviates significantly in the sample in which it occurs. Spatial outliers are different from outliers and many authors like Singh and Lalitha [9]. Outlier detection procedures for two parameter gamma distribution have been discussed by many authors. But one major disadvantage of the gamma distribution is that the distribution (or survival) function cannot be expressed in a closed form if the shape parameter is not an integer. Since it is in terms of an incomplete gamma function, one needs to obtain the distribution/survival function or the failure rate by numerical integration. This is a limitation in the usage of gamma distribution. It is observed that the generalized exponential distribution can be used as an alternative to the gamma distribution in many situations. Different properties like monotonicity of the hazard functions and tail behaviours of the gamma distribution and that of the generalized exponential distribution are quite similar in nature. But the latter one has a nice compact distribution (or survival) function. It is observed that for a given gamma distribution there exists a generalized exponential distribution so that the two distribution functions are almost identical. Since the gamma distribution function does not have a compact form, efficiently generating gamma random numbers is known to be problematic. It was observed that for all practical purposes it is possible to generate approximate gamma random numbers using generalized exponential distribution and the random samples thus obtained cannot be differentiated using any statistical tests. Many authors proposed a location and scale invariant test based on the test statistic Zk for testing the upper outliers in two-parameter exponential sample. Kumar et. al. [7] and Singh and Lalitha [10] have proposed test statistics for testing multiple upper outlier detection in gamma sample. Various test statistics have been proposed to detect outliers in an exponential sample. Likes [8] also proposed a new test statistics to detect outlier in the exponential case. In this paper, the test statistic proposed by Likes has been used to detect outliers in a generalized exponential sample and the critical value of the test statistics has been obtained. A simulation study is carried out to compare the theoretical developments.

查看原文本刊更多论文

广义指数样本中的多个上离群值检测方法

Hawkins[6]将异常值定义为与数据集中的其他观测值存在显著差异，从而引起人们对其产生机制不同的怀疑。Barnett和Lewis[2]将异常值定义为在其发生的样本中显著偏离的观察值。空间离群值不同于离群值，也不同于Singh和Lalitha等许多作者[9]。许多作者讨论了双参数伽马分布的异常值检测方法。但是，gamma分布的一个主要缺点是，如果形状参数不是整数，则分布(或生存)函数不能以封闭形式表示。由于它是不完全伽马函数，因此需要通过数值积分来获得分布/生存函数或故障率。这是使用伽马分布的一个限制。可以看出，在许多情况下，广义指数分布可以作为伽马分布的替代。不同的性质，如风险函数的单调性和gamma分布和广义指数分布的尾部行为在本质上是非常相似的。但后者有一个很好的紧凑分布(或生存)函数。我们观察到，对于给定的伽马分布，存在一个广义指数分布，使得两个分布函数几乎相同。由于伽马分布函数不具有紧致形式，因此已知有效地生成伽马随机数是有问题的。据观察，为了所有实际目的，可以使用广义指数分布产生近似的伽玛随机数，而由此获得的随机样本不能使用任何统计检验加以区分。许多作者提出了一种基于检验统计量Zk的位置和尺度不变检验，用于检验双参数指数样本的上异常值。Kumar等人[7]和Singh和Lalitha[10]提出了检验伽马样本中多个上离群值检测的检验统计量。已经提出了各种检验统计量来检测指数样本中的异常值。Likes[8]也提出了一种新的检验统计量来检测指数情况下的异常值。本文利用Likes提出的检验统计量来检测广义指数样本中的异常值，并得到了检验统计量的临界值。通过仿真研究来比较理论发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Statistics

自引率

0.00%

发文量